Revised
linguistic and ethnological theories and hypotheses, surveys,
(un)official articles, new ideas, maps, criticism, latest
news in historical linguistics
Indo-European
46
Basic Lexemes in All Indo-European Language Groups (Big Scary
Html Table)
Took months to collect, and months to correct.
A sort of bird's-eye view on the Indo-European lexis.
All the Indo-European languages have been divided into about 35
(sub)groups, whose basic vocabulary is seen as considerably different
from that of their neighbors. The internal classification and
ordering of some of these (sub)groups have been modified and rearranged
based on the newly found lexical proximity; otherwise, just factual
data (last corrected and updated 09.2008).
A
Lexicostatistical Classification of Main Indo-European Language
Groups (draft notes)
This lengthy
article discusses lexicostatistical results obtained from
the Big Table using both statistical and classical argumentation.
This text attempts to build a revised phylogeny of the Indo-European
languages. Also, some tentative attempts to do correct glottochronology.
(2008).
A
Detailed Map of Indo-European Migrations
The map is based on the idea that all historical migrations
tend to spread symmetrically in a wave pattern, following the
most suitable routes and avoiding impenetrable geographical regions
(see Theory of Wave Migrations). Otherwise, generally accepted
stuff, for the most part. Free to use for illustrative purposes.
A brief discussion of datings and locations of main group proto-languages
included(last
corrected and updated 05.2009).
A
Theory of Wave Migrations and A Theory of the Indo-European Homeland
near the Caspian Sea This was my early, rather naive work (2005).
Some parts may still be of interest, though. However I no longer
maintain the Caspian hypothesis, since the "circles"
may also be traced to the area near the Black Sea, thus almost
confirming the classical Gimbutas' model. Still, the file could
have a few interesting points.
The
Comparative Indo-European Data Corpus (Isidore Dyen at al.) (doc)
A dataset by Isidore Dyen et al. (1980's?,
published in 1992) rearranged,
shortened, and colored to
produce a more human-readable output; unstable lexemes deleted.
This dataset inspired my work on the Big Table. Curiously, Dyen,
who sadly passed away in 2008 at the age of 96, said that his
main goal to do his research was to test if the lexicostatistical
method was even workable and applicable to the study of Austronesian
languages, in which he took particular interest. Obviously, it
has far wider application.
Indo-European
Wordlists (Ringe at al.) (pdf)
An Indo-European
dataset by Don Ringe (2002) based on the enhanced 301-427-lexeme
Swadesh lists for 24 languages with some additional culturally
significant lexemes.
Balto-Slavic
Lexicostatistics, Satemization, Palatalizations, etc (doc, in
Russian)
The article attempts to explore phonological transitions
in the Baltic and Slavic languages and their lexical proximity.
They are close, but the Baltic languages still form a separate
subgroup within the Balto-Salvic group; we could also draw a rough
analogy with the modern Iranian languages constituting a complex
gamut of multiple linguistic and genetic variations.
Lithuanian-Latvian-Russian
100-word Swadesh List (doc)(2)
Over and over again, the lexicostatistical comparison
demonstrates the close proximity of Slavic to Baltic, although
Lithuanian and Latvian are even closer. Nihil novis.
A
Continuum of Slavic Languages (gif, in English)
Based
on Girdenis-Maziulis' lexicostatistical research (1994). The Slavic
languages are placed geographically, which makes it basically
a wave model instead of a tree model. The length of each line
is proportional to the lexicostatistical distance. Conclusion:
the traditional East-West-South grouping of Slavic languages seems
to be completely arbitrary and outdated—who made it up, anyway?
Generally speaking, for closely related languages, we should rather
stick to wave models. Similar objections to the "tripartite
division" of Slavic were noted by Dyen (1992).
Cognates
in modern English and Russian (doc, in English)NEW How
many cognates would we spot in two distantly related Indo-European
languages if he had no previous knowledge of the classical languages?
Would we even realize that English is Indo-European, if it were
the only Germanic language left? A simple search shows that there's
hardly more than 150-200 identifiable words in English and Russian
remaining from the original PIE stock. (01.2010)
Indo-Iranian
A
Map of Early Migrations of Indo-Iranian peoples
A rather detailed map of
Indo-Iranian migrations which places the Indo-Iranian and Proto-Iranian
Homeland near Bactria.
There may be two plausible hypothesis regarding the arrival of
the early Indo-Iranians: (1) they may have travelled eastwards
along the Oxus (Amu Darya River), which at the time may have flowed
into the Caspian Sea, (2) they may have traveled from the Aral
Sea region along the Yaxartes (Syr Darya). In both cases, they
finally reached the Pamir Mountains and the present-day province
of Nuristan, where they split up and migrated into northern India.
(2008)
A
Tree of Iranian Languages (gif) A lexicostatistical Iranian dendrogram
by Starostin (2004) and its version corrected for possible statistical
deviations (however, his recalibrated glottochronology still seems
to yield datings not
corroborated by other research).
Turkic
The
Turkic Languages in a Nutshell (big page) (ver. 4.1)
A bird's-eye view on the
Turkic peoples and their languages with manyphotographs.
Also, a detailed internal classification of the Turkic language
group based on phonological transitions in 1-10 numbers, 9 carefully
selected basic words, as well as other historical and linguistic
evidence (see below). Reviewed
by Yusuf B. Gürsey. Basically,
this web article is intended to be a concise primer on
Turkic ethnolinguistics (first online 04.2009, last updated
10.2009).
The
Internal Classification and Migrations of Turkic Languages (ver.
4.2) Theoretical
considerations and explanations
of the above mentioned taxonomy. A new classification of Turkic
languages has been built from scratch on the basis of grammatical,
phonological, and lexicostatistical research, including a study
of the early migration routes of Turkic peoples in the Altai-Sayan
mountain system and a quite earnest glottochronological study
(see below) (first
online 04.2009,
last updated 12.2009).
Iranian–Turkic–Mongolic
Regular Correspondences —
Looks Strange!
This table containing about 100
lexemes, which mostly
reflect regular correspondences in accordance with the classical
principles of comparative method,
attempts to demonstrate that the Turkic languages belong to the
Eastern branch of the Indo-European family (sic!). (An opposing
view states that this is nothing but Nostratic correspondences).
Note that most lexical items are semantically simple and historically
stable.
Moreover, some genetic (non-linguistic) evidence is added at the
end of the work. Conclusion:
the Turkic languages could be a lost "satemized" branch
of the Indo-European family, most likely of the Indo-Iranian or
early Iranian stock, although highly modified by intense lenition
and looking very different from other Indo-Iranian languages.
[Incidentally, *satem (hundred) was *Ser or similar
in Proto-Turkic.] For some reason, East Iranian languages seem
to be particularly close to Turkic; it may be due to the presence
of a common substratum. The whole matter could have something
to do with an early separation of Proto-Turkic c. 2500-3000 BC,
but it's still hard to tell. There's no contradiction with the
Altaic theory (that's precisely why the Mongolic data were included),
quite to the contrary: Proto-Turkic and Proto-Mongolic proto-forms
seem to be in good agreement. (2008,
last updated 04.2009). Armenian–Turkic Correspondences —
Even Weirder! A lengthy table. Part of the Turkic-vs-Indo-European
theory (see above). When I realized that Proto-Turkic might be
related to IE languages, I began to scan different IE groups for
similarities with the Turkic languages, and Armenian was part
of that research. It seems to fit nicely – up to 40% in the Swadesh-100.
Note the high semantic stability of all lexical items. There's
no clear historical explanation for this observation so far.
(2007)
Other Altaic
Some
Mongolic/Tungusic CorrespondencesNEW The
table is based on Starostin's datasets.
It shows that Tungusic and Mongolic have many nearly regular correspondences
and are indeed related (up to 38-40% in Swadesh-100, counted elsewhere).
A brief description of Tungusic languages included (10.2009).
Comparing
Basic Lexemes in PIE and Semitic Languages —They Seem Quite Close Nostratics is yet another Big Bad Wolf for
some of those poor conservative riding hoods out there. Just grab
some basic nouns, and check it out for yourself. Generally, Proto-Semitic
seems to be more closely related to PIE than the southern Afro-Asiatic
groups. Starostin, by the way, reached the same conclusion in
one of his articles (unpublished in English?).
Other Afro-Asiatic
The
Terrible Mess of Afro-Asiatic Numbers (1-10) and an Attempt to
Get it Straightened Out
Thetableis based on Mark Rosenfelder's dataset; colors
indicate proximity (red is "closer to a
proto-form", green and blue are "more
innovative"). Numbers provide little evidence for a clear-cut
separation between Omotic and Cushitic, consequently all the "Omo-Cushitic"
internal groupings had to be sorted out and rearranged anew. The
Berber numbers are rather regular and predictable; on the contrary,
the Beja group has non-Cushitic numbers; the Chadic numbers are
so messed up they were just left out. (2007)
Linguistic
Geography
Dialect
and Language Borders Seem to Partly Coincide with Geographic Features
This is a hypothesis that suggests the existence
of a rather natural social
phenomenon resulting from the human tendency to
adapt to the physical environment and occupy a particular limited
territory, which is then seen as their "homeland". As
a result, say, fishermen would tend to inhabit the seashore and
develop their own language over time, hunters would inhabit inland
forests and have their own language, farmers would occupy arable
land, etc. Consequently, all linguistic groups tend to acquire
a natural physical habitat. Other non-geographical factors should
not be downplayed, however. (2005)
Linguisitic
Typology
Could
There Be Giant
Typological Linguistic Areas Spreading
Across the Eastern Hemisphere? Conclusion:
there could, but not necessarily of a genetic type—secondary convergence
is more likely. Even though it would be very tempting to think
these areas represent some kind of traces of ancient migrations
extending historically beyond the Swadesh horizon, there is no
direct evidence for this. Still, the representation looks fine
and could be used for mnemonic, illustrative and other purposes
(2006).
Lexicostatistics
How
Lexicostatistics Can Show English and Spanish Are Related, While
Yoruba Is Not Just
tired of that anti-mass-comparison witch-hunt stuff now repeated
by every fool at every corner, so here's a drop of cold water
on it.
No, by "fool" I don't mean Rosenfelder, but his web
article is cited so often both rightfully and out-of-context,
that I just thought an opposing view deserved some consideration
(2007-2008) (a draft article).
The
Classification of the World Languages Using Automated Phonostatistics
(by ASJP, 04.2009) (pdf)
You
should see it! Here's the frontline of historical linguistics
of the 21st century. What these
guys did was take 40 basic words in each language
of the world, transcribe them phonetically, and compare them phoneme-by-phoneme
using a simple program. You get a tree that is at least 80-90%
correct, depending on the quality of your algorithm and your word
lists. (Actually, they use the much too simplified Levenstein's
algorithm, but that's just one of the many possible ways.) No
more painstaking manual comparativism! You
can call this method phonostatistical, it's entirely new.
(I was intially
beginning to do exactly that in 2007 with my IE list-46,
but was unable to finish the computer program, since I'm no pro
programmer. Well, anyway, great minds think alike.) Outstanding
research! For instance, no matter how simple their method is,
their IE dendrogram is close to real. Just for consideration:
Armenian is finally detached from Greek and placed into Satem,
where it is probably supposed to be; Welsh and Irish are correctly
shown as strongly differentiated; their Omotic-Cushitic tree is
largely similar to the taxonomy I have obtained using a manual
comparison of 1-10 numbers—and that's just the first-order approximation.
This is in fact the revolution of computerized statistical
methods in linguistics that already took place in molecular biology.
Words are DNA, phonems are nucleotide bases, in the best spirit
of Dawkin's analogy between genes and memes. Great job! Way to
go!
In
case you can't get through occasionally, it's a problem with the
hosting (runhosting.com); apparently, they limit the number of
visitors artificially, trying to bulldoze people into paying.
Geocities, where I used to stay, the Titanic of the golden era,
was using exactly the same tactics and finally sank to the bottom,
making me move in here in a lifeboat, but apparently it's the
same story.