Historical Linguistics by Darkstar

Revised linguistic and ethnological theories and hypotheses, surveys, studies, (un)official articles, drafts, preprints, new ideas, maps, criticism, news in historical linguistics
runhosting.com
the best hosting
for scientific applications
 

 

Indo-European

46 Basic Lexemes in All Indo-European Language Groups (Big Scary Html Table)
Took months to collect, and months to correct. A sort of bird's-eye view on the Indo-European basic vocabulary.
All the Indo-European languages have been divided into about 35 (sub)groups, whose basic lexis is seen as considerably different from that of their neighbors. The internal classification and ordering of some of these (sub)groups have been modified and rearranged based on the newly found lexical proximity; otherwise, just factual data (last corrected and updated 09/2008).

The Lexicostatistical Distance Matrix Extracted from the Big Scary Table
Manually counted proximity values that determine the approximate phylogenetic relationship among the Indo-European languages (see below) (2008).

A Lexicostatistical Classification of Main Indo-European Language Groups (draft notes)
This lengthy article discusses lexicostatistical results obtained from the Big Table using both statistical and classical argumentation. It attempts to build a revised phylogeny of the Indo-European languages. Also, some tentative attempts to do correct glottochronology (08/2008).

A Detailed Map of Indo-European Migrations
The map is based on the idea that all historical migrations tend to spread symmetrically in a wave pattern, following the most suitable routes and avoiding impenetrable geographical regions (see The Theory of Wave Migration). Otherwise, generally accepted stuff, for the most part. A brief discussion of dates and locations of main group proto-languages is included
(last corrected and updated 05/2009). Free to use for illustrative purposes. BTW, this map and some of the the Balto-Slavic data below have been used as educational materials by Vyacheslav Ivanov (the famous coauthor of the glottalic theory) in a TV lecture for the Rossiya-K (Culture) Channel (2010), which perhaps may answer the question that people often ask concerning the usability and credibility of the materials at this site (though, of course, it is always up to the reader to decide what's hot and what's not).

Indo-European Numbers (1-10), and the Internal Classification of Indo-European Languages
See how the Romance, Germanic, etc. classifications can be clarified by using just a small set of numbers. Why? Because numbers have very stable semantics, therefore all sound laws, phonological transitions, and errors from old-fashioned classifications become readily visible. The table is based on Mark Rosenfelder's glorious and outstanding dataset; colors usually indicate proximity (red is "closer to a proto-form", green and blue are "more innovative").

A Theory of Wave Migration and A Theory of the Indo-European Homeland near the Caspian Sea
This was an early, rather naive work. Some parts may still be of interest, though. However I no longer maintain the "Caspian hypothesis", since the circles may also be traced to the area near the Black Sea, thus almost confirming the classical Gimbutas' model. Still, the file could have a few interesting points (08/2005).

The Comparative Indo-European Data Corpus (Isidore Dyen at al.) (doc)
A dataset by Isidore Dyen et al. (the 1980's?, published in 1992) rearranged, shortened, and colored to produce a more human-readable output; unstable lexemes deleted. This dataset inspired my work on the Big Table. Curiously, Dyen, who sadly passed away in 2008 at the age of 96, said that his main goal to do his research was to test if the lexicostatistical method was even workable and applicable to the study of Austronesian languages, in which he took particular interest. Obviously, his research had wider application.

Indo-European Wordlists (Ringe at al.) (pdf)
An Indo-European dataset by Don Ringe (2002) based on the enhanced 301-427-lexeme Swadesh lists for 24 languages with some additional culturally significant lexemes.

On the Internal Classification of Indo-European Languages (Blazhek, 2005) (pdf)
A good survey by Blazhek (2005) investigating the classification of Indo-European languages, including all major groupings (Germanic, Slavic, Iranian, etc) with lots of detailed trees.

 

Balto-Slavic

Balto-Slavic Lexicostatistics, Satemization, Palatalizations, etc (doc, in Russian) (Updated)
This article attempts to explore phonological transitions in the Baltic and Slavic languages and their lexical proximity. They are close, but the Baltic languages still form a separate subgroup within the Balto-Slavic group; we could also draw a rough analogy with the modern Iranian languages constituting a complex gamut of multiple linguistic and genetic variations. (2009-11/2011)

Lithuanian-Latvian-Russian 100-word Swadesh List (doc) (1)
Lithuanian-Latvian-Russian 100-word Swadesh List (doc) (2)
Over and over again, the lexicostatistical comparison demonstrates the close proximity of Slavic to Baltic, although Lithuanian and Latvian are even closer. Nihil novis.

A Lexicostatistical Tree of Slavic Languages (gif, in Russian)
Based on Girdenis-Maziulis' lexicostatistical research (1994), which seems quite fair.

A Continuum of Slavic Languages (gif, in English)
Based on Girdenis-Maziulis' lexicostatistical research (1994). The Slavic languages are placed geographically, which makes it basically a wave model instead of a tree model. The length of each line is proportional to the lexicostatistical distance. Conclusion: the traditional East-West-South grouping of Slavic languages seems to be completely arbitrary and outdated—who made it up, anyway? Generally speaking, for closely related languages, we should rather stick to wave models. Similar objections to the "tripartite division" of Slavic were noted by Dyen (1992).

Cognates in modern English and Russian (doc, in English)
How many cognates would we spot in two distantly related Indo-European languages if we had no previous knowledge of the classical languages? Would we even realize that English is Indo-European, if it were the only Germanic language left? A simple search shows that there's hardly more than 150-200 identifiable words in English and Russian remaining from the original PIE stock. (01/2010)

 


Indo-Iranian

A Map of Early Indo-Iranian Migrations
A rather detailed map of Indo-Iranian migrations which places the Indo-Iranian and Proto-Iranian Homeland near Bactria.
There may be two plausible hypotheses regarding the arrival of the early Indo-Iranians: (1) they may have traveled eastwards along the Oxus (Amu Darya River), which at the time may have flowed into the Caspian Sea, (2) they may have traveled from the Aral Sea region along the Yaxartes (Syr Darya). In both cases, they finally reached the Pamirs and the present-day province of Nuristan, where they split up and migrated into northern India. (1/2006, last update 2008)

Indo-Iranian Numbers (1-10), and the Internal Classification of Indo-Iranian Languages
The table is based on Mark Rosenfelder's dataset; colors normally indicate proximity (red is "closer to a proto-form", green and blue are "more innovative"). The numbers just help to show what a realistic (not the "classical" one) internal classification should look like.

A Short List of Words in Iranian Languages
The table is based on wiki lists. Colors indicates proximity. Just for consideration.

A Tree of Iranian Languages

A lexicostatistical Iranian dendrogram by Starostin (2004) and its version corrected for possible statistical deviations. However, his recalibrated glottochronology still seems to yield datings not corroborated by other research.

 

Turkic

The Turkic Languages in a Nutshell (ver. 6.3) (Updated again)
A bird's-eye view on the Turkic peoples and their languages with word examples, photographs and songs. The detailed internal classification of the Turkic language group was based on the multilateral research described in detail below. Reviewed by Yusuf B. Gürsey (2009-10). Basically, this web article is intended to be a concise primer on Turkic ethnolinguistics (first online 04/2009, updated 11/2010, updated 12/2011, last updated 04-05/2012).

The Internal Classification and Migration of Turkic Languages (ver. 7.3) (Updated again)
This article includes all the theoretical considerations for the above mentioned taxonomy. Essentially, a new classification of Turkic languages has been built from scratch on the basis of grammatical, phonological, lexicostatistical and historical research, including several studies of the early migration routes of Turkic peoples. (First online 04/2009, major changes by 12/2009, updated 11/2010, major update 12/2011, last updated 04-05/2012).

The Lexicostatistics and Glottochronology of Turkic Languages (ver. 3) (Updated)
A classical lexicostatistical study of 15 Turkic languages has been conducted using the 215-word Swadesh lists, originally prepared for Wiktionary.org, and then expanded and verified for possible semantic errors and borrowings. The calculations were aided by a special php-application. An older version (2009-11) with the "glottochronological corrections" has been removed due to internal complexities of the methodology. (First online 10-12/2009, changes by 11-12/2011, major update 04/2012).

The Proto-Turkic Urheimat and The Early Migrations of Turkic Peoples (ver. 3.1) (New)
A multilateral linguistic, geographical, historical and archaeological analysis of Bulgaro-Turkic and Proto-Turkic was performed attempting to find the position of the Urheimat area and trace the earliest routes of Turkic migrations. The main part of this article had originally been included into The Internal classification [...] but then transformed into a separate publication (First online 04/2009, changes in 11-12/2011, major update 04-05/2012).

A Dendrogram of the Turkic Languages (Updated)
Included into the pages above (
updated 12/2011, last updated 04/2012).

 

Other Altaic

Some Mongolic/Tungusic Correspondences
The table is based on Starostin's datasets. It shows that Tungusic and Mongolic have many nearly regular correspondences and are indeed related (up to 38-40% in Swadesh-100; counted elsewhere). A brief description of Tungusic languages is also included. (10/2009).

 

Nostratic research

Comparing Basic Lexemes in PIE and Semitic Languages —They Seem Quite Close
Nostratics is yet another Big Bad Wolf for some of those poor conservative riding hoods out there. Just grab some basic nouns, and check it out for yourself. Generally, Proto-Semitic seems to be more closely related to PIE than the southern Afro-Asiatic groups. Starostin, by the way, reached the same conclusion in one of his articles (unpublished in English?) (2008).

Iranian–Turkic–Mongolic Regular Correspondences — Looks Strange!
This table containing about 100 lexemes, which mostly reflect regular correspondences in accordance with the classical principles of comparative method. For some reason, East Iranian languages seem to be particularly close to Turkic. The results may have various interpretations: these may be nothing but Nostratic correspondences; or the common lexemes may be related to the presence of a common substratum; or the Turkic languages could be a lost "satemized" branch of the Indo-European family, most likely of the Indo-Iranian or even early Iranian stock, although highly modified by intense lenition and looking very different from other Indo-Iranian languages. [Incidentally, *satem (hundred) was *Ser or similar in Proto-Turkic.]. Moreover, some genetic evidence is added at the end of the work. Note that most of these lexical items are semantically simple and historically stable. There's no contradiction with the Altaic theory (that's precisely why the Mongolic data were included), quite to the contrary: Proto-Turkic and Proto-Mongolic proto-forms seem to be in good agreement. (2008, last updated 04/2009).

Armenian–Turkic Correspondences — Even Weirder!
A lengthy table. Part of the Turkic-vs-Indo-European theory (see above). When I realized that Proto-Turkic might be related to IE languages, I began to scan different IE groups for similarities with the Turkic languages, and Armenian was part of that research. It seems to fit nicely – up to 40% in the Swadesh-100. Note the high semantic stability of all lexical items. There's no clear historical explanation for this observation so far, but evidently this has something to do with Nostratic correspondences (2007).



Semitic

Semitic Numbers (1-10), and the Internal Classification of Semitic Languages
The table is based on Mark Rosenfelder's dataset. It clarifies the internal structure of the Semitic languages.

Bayesian phylogenetic analysis of Semitic languages
A glottochronological study by Andrew Kitchen, Christopher Ehret at al (04/2009). The authors conducted the classical cognate-based lexicostatistics of the 96-word Swadesh wordlist for 25 Semitic languages with the local calibration and the automatically constructed internal phylogeny. To cut the long story short, they tentatively dated Proto-Semitic to about 3700 BCE, which seems to be more or less consistent with other estimates (e.g. that of ASJP).

 

Other Afro-Asiatic

The Terrible Mess of Afro-Asiatic Numbers (1-10) and an Attempt to Get it Straightened Out
The table is based on Mark Rosenfelder's dataset; colors indicate proximity (red is "closer to a proto-form", green and blue are "more innovative"). Numbers provide little evidence for a clear-cut separation between Omotic and Cushitic, consequently all the "Omo-Cushitic" internal groupings had to be sorted out and rearranged in a completely new fashion. The Berber numbers are rather regular and predictable; on the contrary, the Beja group has non-Cushitic numbers; the Chadic numbers seemed to be so messed up they were just left out. (2007)



Linguistic Geography

Dialect and Language Borders Seem to Partly Coincide with Geographic Features
This hypothesis suggests the existence of a rather natural social phenomenon resulting from the human tendency to adapt to the physical environment and occupy a particular limited territory, which is then seen as their "homeland". As a result, say, fishermen would tend to inhabit the seashore and develop their own language over time, hunters would inhabit inland forests and have their own language, farmers would occupy arable land, etc. Consequently, all linguistic groups tend to acquire a natural physical habitat. Other non-geographical factors should not be downplayed, however. (2005)

The concept of "the Central Asian Bridge"
The existence of some sort of a proto-Silk Road as a strip of arable land unifying West and East was suggested herein c. 2005. Similar ideas about the ancient Silk Road were also developed at the the Silk Road Symposium (2011) by Colin Renfrew et al.



Linguistic Typology

Could There Be Giant Typological Linguistic Areas Spreading Across the Eastern Hemisphere?
Conclusion: there could, but not necessarily of a genetic type—secondary convergence is more likely. Even though it would be very tempting to think that these areas represent some kind of traces of ancient migrations extending historically beyond the Swadesh horizon, there is no direct evidence for that. Still, this representation looks fine and could be used for mnemonic, illustrative and other purposes (May 2006).



Lexicostatistics

How Lexicostatistics Can Show English and Spanish Are Related, While Yoruba Is Not
Just tired of that anti-mass-comparison witch-hunt stuff now repeated by every fool at every corner, so here's a drop of cold water on it.
No, by "fool" I definitely don't mean Rosenfelder, but his web article is cited so often both rightfully and out-of-context, that I just thought an opposing view deserved some consideration (preliminary notes in May 25, 2007 – 2008) (a permanent draft).
The subsequent research conducted by the ASJP group by 2009 took similar considerations several steps further.


Analyzing Genetic Connections between Languages by Matching Consonant Classes
by Peter Turchin, Ilia Peiros, Murray Gell-Mann (2010).

The Classification of the World Languages Using Automated Phonostatistics (by ASJP, 04/2009) (pdf)
You should see this! Here's the frontline of historical linguistics of the 21st century. What these guys did was take 40 basic words in each language of the world, transcribe them phonetically, and compare them phoneme-by-phoneme using a relatively simple program. As a result, you get a tree that is at least 80-90% correct, depending on the quality of your algorithm and your word lists. (Actually, they use the much too simplified Levenstein's algorithm, but that's just one of the many possible ways.) No more painstaking manual comparativism! You can call this method phonostatistical, it's entirely new. (I was intially beginning to do exactly that in 2007 with my IE list-46, but was unable to finish the computer program, since I was no pro programmer. Well, anyway, great minds think alike.) Outstanding research! For instance, no matter how simple their method is, their IE dendrogram is close to real. Just for consideration: Armenian is finally detached from Greek and placed into Satem, where it is probably supposed to be; Welsh and Irish are correctly shown as strongly differentiated; their Omotic-Cushitic tree is largely similar to the taxonomy I have obtained using a manual comparison of 1-10 numbers—and that's all just the first-order approximation. This is in fact the revolution of computerized statistical methods in linguistics that already took place in molecular biology. Words are DNA, phonems are nucleotide bases, in the best spirit of Dawkin's analogy between genes and memes. Great job! Way to go!


The Fundamentals of the Lexicostatistics and Glottochronology (New)
This online article summarizes the proper methodological procedures for doing basic lexicostatistical and glottochronological research. (4/2012).

 

 

 

 

enstnew at mail ru

2005-2012