Revised
linguistic and ethnological theories and hypotheses, surveys, studies, (un)official
articles, drafts, preprints, new ideas, maps, criticism, news in historical linguistics
runhosting.com
the best hosting for scientific applications
Indo-European
46
Basic Lexemes in All Indo-European Language Groups (Big Scary Html Table)
Took
months to collect, and months to correct. A sort of bird's-eye view on
the Indo-European basic vocabulary. All the Indo-European languages have
been divided into about 35 (sub)groups, whose basic lexis is seen as considerably
different from that of their neighbors. The internal classification and ordering
of some of these (sub)groups have been modified and rearranged based on the newly
found lexical proximity; otherwise, just factual data (last corrected and updated
09/2008).
A
Lexicostatistical Classification of Main Indo-European Language Groups (draft
notes) This
lengthy article discusses lexicostatistical results obtained from the Big Table
using both statistical and classical argumentation. It attempts to build a revised
phylogeny of the Indo-European languages. Also, some tentative attempts to do
correct glottochronology (08/2008).
A
Detailed Map of Indo-European Migrations
The map is based on the idea that all historical migrations tend to spread
symmetrically in a wave pattern, following the most suitable routes and avoiding
impenetrable geographical regions (see The Theory of Wave Migration). Otherwise,
generally accepted stuff, for the most part. A brief discussion of dates and locations
of main group proto-languages is included (last
corrected and updated 05/2009).
Free
to use for illustrative purposes. BTW, this map and some of the the Balto-Slavic
data below have been used as educational materials by Vyacheslav
Ivanov (the famous coauthor of the glottalic theory) in a TV
lecture for the Rossiya-K (Culture) Channel (2010), which
perhaps may answer the question that people often ask concerning the usability
and credibility of the materials at this site (though, of course, it is always
up to the reader to decide what's hot and what's not).
A
Theory of Wave Migration and A Theory of the Indo-European Homeland near the Caspian
Sea This was an early, rather naive
work. Some parts may still be of interest, though. However I no longer maintain
the "Caspian hypothesis", since the circles may also be traced to the
area near the Black Sea, thus almost confirming the classical Gimbutas' model.
Still, the file could have a few interesting points (08/2005).
The
Comparative Indo-European Data Corpus (Isidore Dyen at al.) (doc)
A dataset by Isidore Dyen et al. (the 1980's?, published
in 1992) rearranged,
shortened, and colored to produce a more human-readable
output; unstable lexemes deleted. This dataset inspired my work on the
Big Table. Curiously, Dyen, who sadly passed away in 2008 at the age of 96, said
that his main goal to do his research was to test if the lexicostatistical method
was even workable and applicable to the study of Austronesian languages, in which
he took particular interest. Obviously, his research had wider application.
Indo-European
Wordlists (Ringe at al.) (pdf) An
Indo-European dataset by Don Ringe (2002) based on the enhanced 301-427-lexeme
Swadesh lists for 24 languages with some additional culturally significant lexemes.
Balto-Slavic
Lexicostatistics, Satemization, Palatalizations, etc (doc, in Russian)
(Updated)
This article attempts to explore phonological transitions in the
Baltic and Slavic languages and their lexical proximity. They are close, but the
Baltic languages still form a separate subgroup within the Balto-Slavic group;
we could also draw a rough analogy with the modern Iranian languages constituting
a complex gamut of multiple linguistic and genetic variations. (2009-11/2011)
A
Continuum of Slavic Languages (gif, in English) Based
on Girdenis-Maziulis' lexicostatistical research (1994). The Slavic languages
are placed geographically, which makes it basically a wave model instead of a
tree model. The length of each line is proportional to the lexicostatistical distance.
Conclusion: the traditional East-West-South grouping of Slavic languages seems
to be completely arbitrary and outdated—who made it up, anyway? Generally speaking,
for closely related languages, we should rather stick to wave models. Similar
objections to the "tripartite division" of Slavic were noted by Dyen
(1992).
Cognates
in modern English and Russian (doc, in English) How
many cognates would we spot in two distantly related Indo-European languages if
we had no previous knowledge of the classical languages? Would we even realize
that English is Indo-European, if it were the only Germanic language left? A simple
search shows that there's hardly more than 150-200 identifiable words in English
and Russian remaining from the original PIE stock. (01/2010)
Indo-Iranian
A
Map of Early Indo-Iranian Migrations
A rather detailed
map of Indo-Iranian migrations which places the Indo-Iranian and Proto-Iranian
Homeland near Bactria. There may be two plausible hypotheses regarding the
arrival of the early Indo-Iranians: (1) they may have traveled eastwards along
the Oxus (Amu Darya River), which at the time may have flowed into the Caspian
Sea, (2) they may have traveled from the Aral Sea region along the Yaxartes (Syr
Darya). In both cases, they finally reached the Pamirs and the present-day province
of Nuristan, where they split up and migrated into northern India. (1/2006,
last update 2008)
Indo-Iranian
Numbers (1-10), and the Internal Classification of Indo-Iranian Languages The
table is based on Mark Rosenfelder's dataset; colors normally indicate proximity
(red is "closer to a proto-form", green and blue are "more innovative").
The numbers just help to show what a realistic (not the "classical"
one) internal classification should look like.
A
Tree of Iranian Languages
A lexicostatistical Iranian dendrogram by Starostin (2004) and its version corrected
for possible statistical deviations. However, his recalibrated glottochronology
still seems to yield datings not corroborated by other research.
Turkic
The
Turkic Languages in a Nutshell (ver. 6.3)
(Updated again)
A bird's-eye
view on the Turkic peoples and their languages with word examples, photographs
and songs. The detailed internal classification of the Turkic language group
was based on the multilateral research described in detail below. Reviewed by
Yusuf B. Gürsey (2009-10). Basically, this web article is intended to be
a concise primeron Turkic ethnolinguistics(first online 04/2009,
updated 11/2010, updated 12/2011, last updated 04-05/2012).
The
Internal Classification and Migration of Turkic Languages (ver. 7.3)
(Updated again)
This article
includes all the theoretical considerations for the above mentioned taxonomy.
Essentially, a new classification of Turkic languages has been built from scratch
on the basis of grammatical, phonological, lexicostatistical and historical research,
including several studies of the early migration routes of Turkic peoples. (First
online04/2009, major changes by 12/2009, updated
11/2010,major
update 12/2011,
last updated 04-05/2012).
The
Lexicostatistics and Glottochronology of Turkic Languages (ver. 3)
(Updated)
A classical
lexicostatistical study of 15 Turkic languages has been conducted using the 215-word
Swadesh lists, originally prepared for Wiktionary.org, and then expanded and verified
for possible semantic errors and borrowings.
The calculations were aided by a special php-application. An
older version (2009-11) with the "glottochronological corrections" has
been removed due to internal complexities of the methodology. (First
online10-12/2009, changes by 11-12/2011, major update 04/2012).
The
Proto-Turkic Urheimat and The Early Migrations of Turkic Peoples (ver. 3.1)
(New)
A multilateral
linguistic, geographical, historical and archaeological analysis of Bulgaro-Turkic
and Proto-Turkic was performed attempting to find the position of the Urheimat
area and trace the earliest routes of Turkic migrations.
The main part of this article had originally been included into TheInternal
classification [...] but then transformed into a separate publication (First
online04/2009, changes in 11-12/2011, major update 04-05/2012).
Some
Mongolic/Tungusic Correspondences The
table is based on Starostin's datasets. It shows that Tungusic
and Mongolic have many nearly regular correspondences and are indeed related (up
to 38-40% in Swadesh-100; counted elsewhere). A brief description of Tungusic
languages is also included. (10/2009).
Nostratic
research
Comparing
Basic Lexemes in PIE and Semitic Languages —They Seem Quite Close Nostratics is yet another Big Bad Wolf for some of those poor
conservative riding hoods out there. Just grab some basic nouns, and check it
out for yourself. Generally, Proto-Semitic seems to be more closely related to
PIE than the southern Afro-Asiatic groups. Starostin, by the way, reached the
same conclusion in one of his articles (unpublished in English?)
(2008).
Iranian–Turkic–Mongolic
Regular Correspondences — Looks Strange! This table containing about 100 lexemes, which mostly reflect regular
correspondences in accordance with the classical principles of comparative method.
For some reason, East Iranian languages seem to be particularly close to Turkic.
The results may have various interpretations: these may be nothing but Nostratic
correspondences; or the common lexemes may be related to the presence of a common
substratum; or the Turkic languages could be a lost "satemized" branch
of the Indo-European family, most likely of the Indo-Iranian or even early Iranian
stock, although highly modified by intense lenition and looking very different
from other Indo-Iranian languages. [Incidentally, *satem (hundred) was
*Ser or similar in Proto-Turkic.]. Moreover, some genetic evidence is added
at the end of the work. Note that most of these lexical items are semantically
simple and historically stable. There's no contradiction with the Altaic theory
(that's precisely why the Mongolic data were included), quite to the contrary:
Proto-Turkic and Proto-Mongolic proto-forms seem to be in good agreement. (2008,
last updated 04/2009).
Armenian–Turkic
Correspondences — Even Weirder! A lengthy table. Part of the Turkic-vs-Indo-European theory
(see above). When I realized that Proto-Turkic might be related to IE languages,
I began to scan different IE groups for similarities with the Turkic languages,
and Armenian was part of that research. It seems to fit nicely – up to 40% in
the Swadesh-100. Note the high semantic stability of all lexical items. There's
no clear historical explanation for this observation so far, but evidently this
has something to do with Nostratic correspondences (2007).
Bayesian
phylogenetic analysis of Semitic languages A
glottochronological study by Andrew Kitchen, Christopher Ehret at al (04/2009).
The authors conducted
the classical cognate-based lexicostatistics
of the 96-word Swadesh wordlist for 25 Semitic languages with the local calibration
and the automatically constructed internal phylogeny. To cut the long story short,
they tentatively dated Proto-Semitic to about 3700 BCE, which seems to be more
or less consistent with other estimates (e.g. that of ASJP).
Other
Afro-Asiatic
The
Terrible Mess of Afro-Asiatic Numbers (1-10) and an Attempt to Get it Straightened
Out Thetableis based on Mark Rosenfelder's dataset; colors indicate proximity
(red is "closer to a
proto-form", green and blue are "more innovative").
Numbers provide little evidence for a clear-cut separation between Omotic and
Cushitic, consequently all the "Omo-Cushitic" internal groupings had
to be sorted out and rearranged in a completely new fashion. The Berber numbers
are rather regular and predictable; on the contrary, the Beja group has non-Cushitic
numbers; the Chadic numbers seemed to be so messed up they were just left out.
(2007)
Linguistic
Geography
Dialect
and Language Borders Seem to Partly Coincide with Geographic Features This hypothesis suggests the existence of a rather natural
social
phenomenon resulting from the human tendency to adapt to the physical
environment and occupy a particular limited territory, which is then seen as their
"homeland". As a result, say, fishermen would tend to inhabit the seashore
and develop their own language over time, hunters would inhabit inland forests
and have their own language, farmers would occupy arable land, etc. Consequently,
all linguistic groups tend to acquire a natural physical habitat. Other non-geographical
factors should not be downplayed, however. (2005)
The
concept of "the Central Asian Bridge" The
existence of some sort of a proto-Silk Road as a strip of arable land unifying
West and East was suggested herein c. 2005. Similar ideas about the ancient
Silk Road were also developed at the the Silk Road Symposium (2011) by Colin Renfrew
et al.
Linguistic
Typology
Could
There Be Giant
Typological Linguistic Areas Spreading
Across the Eastern Hemisphere? Conclusion:
there could, but not necessarily of a genetic type—secondary convergence is more
likely. Even though it would be very tempting to think that these areas represent
some kind of traces of ancient migrations extending historically beyond the Swadesh
horizon, there is no direct evidence for that. Still, this representation looks
fine and could be used for mnemonic, illustrative and other purposes (May
2006).
Lexicostatistics
How
Lexicostatistics Can Show English and Spanish Are Related, While Yoruba Is Not Just tired of that
anti-mass-comparison witch-hunt stuff now repeated by every fool at every corner,
so here's a drop of cold water on it.
No, by "fool" I definitely don't mean Rosenfelder, but his web
article
is cited so often both rightfully and out-of-context, that I just thought an opposing
view deserved some consideration
(preliminary notes in May 25, 2007 – 2008) (a permanent draft).
The subsequent research conducted by the ASJP group by 2009 took similar
considerations several steps further.
The
Classification of the World Languages Using Automated Phonostatistics (by
ASJP, 04/2009) (pdf)
You
should see this! Here's the frontline of historical linguistics of the 21st century.
What these
guys did was take 40 basic words in each language of the world,
transcribe them phonetically, and compare them phoneme-by-phoneme using a relatively
simple program. As a result, you get a tree that is at least 80-90% correct, depending
on the quality of your algorithm and your word lists. (Actually, they use the
much too simplified Levenstein's algorithm, but that's just one of the many possible
ways.) No more painstaking manual comparativism! You
can call this method phonostatistical, it's entirely new.
(I was intially beginning to
do exactly that in 2007 with my IE list-46, but was unable to finish the
computer program, since I was no pro programmer. Well, anyway, great minds think
alike.) Outstanding research! For instance, no matter how simple their method
is, their IE dendrogram is close to real. Just for consideration: Armenian is
finally detached from Greek and placed into Satem, where it is probably supposed
to be; Welsh and Irish are correctly shown as strongly differentiated; their Omotic-Cushitic
tree is largely similar to the taxonomy I have obtained using a manual comparison
of 1-10 numbers—and that's all just the first-order approximation. This is in
fact the revolution of computerized statistical methods in linguistics
that already took place in molecular biology. Words are DNA, phonems are nucleotide
bases, in the best spirit of Dawkin's analogy between genes and memes. Great job!
Way to go!