Historical Linguistics by Darkstar

Revised linguistic and ethnological theories and hypotheses, surveys,
(un)official articles, new ideas, maps, criticism, latest news in historical linguistics




Indo-European

46 Basic Lexemes in All Indo-European Language Groups (Big Scary Html Table)
Took months to collect, and months to correct. A sort of bird's-eye view on the Indo-European lexis.
All the Indo-European languages have been divided into about 35 (sub)groups, whose basic vocabulary is seen as considerably different from that of their neighbors. The internal classification and ordering of some of these (sub)groups have been modified and rearranged based on the newly found lexical proximity; otherwise, just factual data (last corrected and updated 09.2008).

The Lexicostatistical Distance Matrix Extracted from the Big Scary Table
Manually counted proximity values that determine the approximate phylogenetic relationship among the Indo-European languages (see below) (2008).

A Lexicostatistical Classification of Main Indo-European Language Groups (draft notes)
This lengthy article discusses lexicostatistical results obtained from the Big Table using both statistical and classical argumentation. This text attempts to build a revised phylogeny of the Indo-European languages. Also, some tentative attempts to do correct glottochronology. (2008).

A Detailed Map of Indo-European Migrations
The map is based on the idea that all historical migrations tend to spread symmetrically in a wave pattern, following the most suitable routes and avoiding impenetrable geographical regions (see Theory of Wave Migrations). Otherwise, generally accepted stuff, for the most part. Free to use for illustrative purposes. A brief discussion of datings and locations of main group proto-languages included
(last corrected and updated 05.2009).

A Theory of Wave Migrations and A Theory of the Indo-European Homeland near the Caspian Sea
This was my early, rather naive work (2005). Some parts may still be of interest, though. However I no longer maintain the Caspian hypothesis, since the "circles" may also be traced to the area near the Black Sea, thus almost confirming the classical Gimbutas' model. Still, the file could have a few interesting points.

Indo-European Numbers (1-10), and the Internal Classification of Indo-European Languages
See how the Romance, Germanic, etc. classifications can be clarified by using just a small set of numbers. Why? Because numbers have very stable semantics, therefore all sound laws, phonological transitions, and errors from old-fashioned classifications become readily visible. The table is based on Mark Rosenfelder's glorious and outstanding dataset; colors usually indicate proximity (red is "closer to a proto-form", green and blue are "more innovative").

The Comparative Indo-European Data Corpus (Isidore Dyen at al.) (doc)
A dataset by Isidore Dyen et al. (1980's?, published in 1992) rearranged, shortened, and colored to produce a more human-readable output; unstable lexemes deleted. This dataset inspired my work on the Big Table. Curiously, Dyen, who sadly passed away in 2008 at the age of 96, said that his main goal to do his research was to test if the lexicostatistical method was even workable and applicable to the study of Austronesian languages, in which he took particular interest. Obviously, it has far wider application.

Indo-European Wordlists (Ringe at al.) (pdf)
An Indo-European dataset by Don Ringe (2002) based on the enhanced 301-427-lexeme Swadesh lists for 24 languages with some additional culturally significant lexemes.

On the Internal Classification of Indo-European Languages (Blazhek, 2005) (pdf)
A good survey by Blazhek (2005) investigating the classification of Indo-European languages, including all major groupings (Germanic, Slavic, Iranian, etc) with lots of detailed trees.

 

Balto-Slavic

Balto-Slavic Lexicostatistics, Satemization, Palatalizations, etc (doc, in Russian)
The article attempts to explore phonological transitions in the Baltic and Slavic languages and their lexical proximity. They are close, but the Baltic languages still form a separate subgroup within the Balto-Salvic group; we could also draw a rough analogy with the modern Iranian languages constituting a complex gamut of multiple linguistic and genetic variations.

Lithuanian-Latvian-Russian 100-word Swadesh List (doc) (1)

Lithuanian-Latvian-Russian 100-word Swadesh List (doc) (2)
Over and over again, the lexicostatistical comparison demonstrates the close proximity of Slavic to Baltic, although Lithuanian and Latvian are even closer. Nihil novis.

A Lexicostatistical Tree of Slavic Languages (gif, in Russian)
Based on Girdenis-Maziulis' lexicostatistical research (1994), which seems quite fair.

A Continuum of Slavic Languages (gif, in English)
Based on Girdenis-Maziulis' lexicostatistical research (1994). The Slavic languages are placed geographically, which makes it basically a wave model instead of a tree model. The length of each line is proportional to the lexicostatistical distance. Conclusion: the traditional East-West-South grouping of Slavic languages seems to be completely arbitrary and outdated—who made it up, anyway? Generally speaking, for closely related languages, we should rather stick to wave models. Similar objections to the "tripartite division" of Slavic were noted by Dyen (1992).

Cognates in modern English and Russian (doc, in English)NEW
How many cognates would we spot in two distantly related Indo-European languages if he had no previous knowledge of the classical languages? Would we even realize that English is Indo-European, if it were the only Germanic language left? A simple search shows that there's hardly more than 150-200 identifiable words in English and Russian remaining from the original PIE stock. (01.2010)


Indo-Iranian

A Map of Early Migrations of Indo-Iranian peoples
A rather detailed map of Indo-Iranian migrations which places the Indo-Iranian and Proto-Iranian Homeland near Bactria.
There may be two plausible hypothesis regarding the arrival of the early Indo-Iranians: (1) they may have travelled eastwards along the Oxus (Amu Darya River), which at the time may have flowed into the Caspian Sea, (2) they may have traveled from the Aral Sea region along the Yaxartes (Syr Darya). In both cases, they finally reached the Pamir Mountains and the present-day province of Nuristan, where they split up and migrated into northern India. (2008)

Indo-Iranian Numbers (1-10), and the Internal Classification of Indo-Iranian Languages
The table is based on Mark Rosenfelder's dataset; colors normally indicate proximity (red is "closer to a proto-form", green and blue are "more innovative")

A Short List of Words in Iranian Languages
The table is
based on wiki lists. Colors indicates proximity. Just for consideration.

A Tree of Iranian Languages (gif)
A lexicostatistical Iranian dendrogram by Starostin (2004) and its version corrected for possible statistical deviations (however, his recalibrated glottochronology still seems to yield datings not corroborated by other research).

 

Turkic

The Turkic Languages in a Nutshell (big page) (ver. 4.1)
A bird's-eye view on the Turkic peoples and their languages with many photographs. Also, a detailed internal classification of the Turkic language group based on phonological transitions in 1-10 numbers, 9 carefully selected basic words, as well as other historical and linguistic evidence (see below). Reviewed by Yusuf B. Gürsey. Basically, this web article is intended to be a concise primer on Turkic ethnolinguistics (first online 04.2009, last updated 10.2009).

The Internal Classification and Migrations of Turkic Languages (ver. 4.2)
Theoretical considerations and explanations of the above mentioned taxonomy. A new classification of Turkic languages has been built from scratch on the basis of grammatical, phonological, and lexicostatistical research, including a study of the early migration routes of Turkic peoples in the Altai-Sayan mountain system and a quite earnest glottochronological study (see below) (first online 04.2009, last updated 12.2009).

A Dendrogram of the Turkic Languages (gif)
Included into the pages above; based upon combined research.

Iranian–Turkic–Mongolic Regular Correspondences — Looks Strange!
This table containing about 100 lexemes, which mostly reflect regular correspondences in accordance with the classical principles of comparative method, attempts to demonstrate that the Turkic languages belong to the Eastern branch of the Indo-European family (sic!). (An opposing view states that this is nothing but Nostratic correspondences). Note that most lexical items are semantically simple and historically stable. Moreover, some genetic (non-linguistic) evidence is added at the end of the work. Conclusion: the Turkic languages could be a lost "satemized" branch of the Indo-European family, most likely of the Indo-Iranian or early Iranian stock, although highly modified by intense lenition and looking very different from other Indo-Iranian languages. [Incidentally, *satem (hundred) was *Ser or similar in Proto-Turkic.] For some reason, East Iranian languages seem to be particularly close to Turkic; it may be due to the presence of a common substratum. The whole matter could have something to do with an early separation of Proto-Turkic c. 2500-3000 BC, but it's still hard to tell. There's no contradiction with the Altaic theory (that's precisely why the Mongolic data were included), quite to the contrary: Proto-Turkic and Proto-Mongolic proto-forms seem to be in good agreement. (2008, last updated 04.2009).

Armenian–Turkic Correspondences
— Even Weirder!
A lengthy table. Part of the Turkic-vs-Indo-European theory (see above). When I realized that Proto-Turkic might be related to IE languages, I began to scan different IE groups for similarities with the Turkic languages, and Armenian was part of that research. It seems to fit nicely – up to 40% in the Swadesh-100. Note the high semantic stability of all lexical items. There's no clear historical explanation for this observation so far. (2007)



Other Altaic

Some Mongolic/Tungusic CorrespondencesNEW
The table is based on Starostin's datasets. It shows that Tungusic and Mongolic have many nearly regular correspondences and are indeed related (up to 38-40% in Swadesh-100, counted elsewhere). A brief description of Tungusic languages included (10.2009).



Semitic

Semitic Numbers (1-10), and the Internal Classification of Semitic Languages
The table is based on Mark Rosenfelder's dataset. It clarifies the internal structure of the Semitic languages.

Comparing Basic Lexemes in PIE and Semitic Languages —They Seem Quite Close
Nostratics is yet another Big Bad Wolf for some of those poor conservative riding hoods out there. Just grab some basic nouns, and check it out for yourself. Generally, Proto-Semitic seems to be more closely related to PIE than the southern Afro-Asiatic groups. Starostin, by the way, reached the same conclusion in one of his articles (unpublished in English?).

 

Other Afro-Asiatic

The Terrible Mess of Afro-Asiatic Numbers (1-10) and an Attempt to Get it Straightened Out
The table is based on Mark Rosenfelder's dataset; colors indicate proximity (red is "closer to a proto-form", green and blue are "more innovative"). Numbers provide little evidence for a clear-cut separation between Omotic and Cushitic, consequently all the "Omo-Cushitic" internal groupings had to be sorted out and rearranged anew. The Berber numbers are rather regular and predictable; on the contrary, the Beja group has non-Cushitic numbers; the Chadic numbers are so messed up they were just left out. (2007)



Linguistic Geography

Dialect and Language Borders Seem to Partly Coincide with Geographic Features
This is a hypothesis that suggests the existence of a rather natural social phenomenon resulting from the human tendency to adapt to the physical environment and occupy a particular limited territory, which is then seen as their "homeland". As a result, say, fishermen would tend to inhabit the seashore and develop their own language over time, hunters would inhabit inland forests and have their own language, farmers would occupy arable land, etc. Consequently, all linguistic groups tend to acquire a natural physical habitat. Other non-geographical factors should not be downplayed, however. (2005)



Linguisitic Typology

Could There Be Giant Typological Linguistic Areas Spreading Across the Eastern Hemisphere?
Conclusion: there could, but not necessarily of a genetic type—secondary convergence is more likely. Even though it would be very tempting to think these areas represent some kind of traces of ancient migrations extending historically beyond the Swadesh horizon, there is no direct evidence for this. Still, the representation looks fine and could be used for mnemonic, illustrative and other purposes (2006).



Lexicostatistics

How Lexicostatistics Can Show English and Spanish Are Related, While Yoruba Is Not
Just tired of that anti-mass-comparison witch-hunt stuff now repeated by every fool at every corner, so here's a drop of cold water on it.
No, by "fool" I don't mean Rosenfelder, but his web article is cited so often both rightfully and out-of-context, that I just thought an opposing view deserved some consideration (2007-2008) (a draft article).

The Classification of the World Languages Using Automated Phonostatistics (by ASJP, 04.2009) (pdf)
You should see it! Here's the frontline of historical linguistics of the 21st century. What these guys did was take 40 basic words in each language of the world, transcribe them phonetically, and compare them phoneme-by-phoneme using a simple program. You get a tree that is at least 80-90% correct, depending on the quality of your algorithm and your word lists. (Actually, they use the much too simplified Levenstein's algorithm, but that's just one of the many possible ways.) No more painstaking manual comparativism! You can call this method phonostatistical, it's entirely new. (I was intially beginning to do exactly that in 2007 with my IE list-46, but was unable to finish the computer program, since I'm no pro programmer. Well, anyway, great minds think alike.) Outstanding research! For instance, no matter how simple their method is, their IE dendrogram is close to real. Just for consideration: Armenian is finally detached from Greek and placed into Satem, where it is probably supposed to be; Welsh and Irish are correctly shown as strongly differentiated; their Omotic-Cushitic tree is largely similar to the taxonomy I have obtained using a manual comparison of 1-10 numbers—and that's just the first-order approximation. This is in fact the revolution of computerized statistical methods in linguistics that already took place in molecular biology. Words are DNA, phonems are nucleotide bases, in the best spirit of Dawkin's analogy between genes and memes. Great job! Way to go!


The Method of Glottochronological Corrections & the Glottochronology of Turkic Languages —NEW
An article explaining how to improve the classical Swadesh glottochronology when a large set of languages is used!
A simple php-program to avoid manual calculations included. (10.2009)



Other notes

In case you can't get through occasionally, it's a problem with the hosting (runhosting.com); apparently, they limit the number of visitors artificially, trying to bulldoze people into paying. Geocities, where I used to stay, the Titanic of the golden era, was using exactly the same tactics and finally sank to the bottom, making me move in here in a lifeboat, but apparently it's the same story.

 

 

enstnew at mail ru

2005-2009