|
DISCUSSING THE REVISED PHYLOGENY
Hypothesis:
Albanian may
be related to Celtic Evidently,
there's a profound dissimilarity between Goidelic and Brythonic languages, which
makes "Celtic" a rather deep, archaic grouping similar in this respect
to the Balto-Slavic or Iranian branches. In fact, the herein assumed lexicostatical
depth of 52% seems to be greater than that for the Balto-Slavic group (65%). Consequently,
the great depth of the Celtic branch may finally allow to add Albanian
with its similar lexicostatistical separation of about 50% to the Celtic languages.
Is Albanian really Celtic? The similarities between Albanian and other
Celtic languages can be directly exemplified by the following lexemes, some of
which may turn out to be unique shared innovations: OIr.
uisce, Alb. uje, ujt (water) (obvious phonological similarity),
rather opposed to Latin unda 'wave' with a modified meaning; OIr.
ainm; Old Welsh anu;
Gheg êmën (name) (apparently, a metathesis from *namen >
*anmen, rather unique in Europe); OIr.
<súil> /su:il/; Alb. <sy> /sü/ (eye)
(the relatedness to IE "sun" seems doubtful because of semantic differences);
OIr. duille, Welsh deilen; Alb. <gjethe>
/dJeße/ (leaf), as opposed to Latin folium, Gr. fúllon;
OIr. carric 'rock'; Welsh carreg; Alb. gurë
(stone) (probably akin to Eng. hard, and Toch. B kärweñe
'stone'); Welsh gwraig /gura'ig/, Alb. grua (wife,
woman); Welsh w^y; Br. vi; Alb. vezë;
ve (egg) (phonologically similar); OIr. gin; Welsh
or Cornish genau, Br. quen, Alb. goje
(mouth) as opposed to Latin gena 'cheek', Eng. chin with a different
meaning; OIr. athir, Alb. atë (father) (a
similar development, which may not be coincidental); Ir. féar,
Welsh gwair; gwellt; Alb. bar (grass) ;
[Also cf. the regular correspondence of Irish /f/ : Welsh /g/
: Alb. /b/ in Ir. <fear>, OIr. <fer>;
Old Welsh <gur>; Alb. burrë (man) (as
well as Latin vir; Anglo-Saxon wer, Lith. viras), and OIr
<find>, Welsh <gwyn>, Alb. <bardhë>
/barðê/ 'white']
OIr. tech; Welsh <ty^>; Br. ti;
Alb. shtëpi (house) (also Gr. stegn 'house', but
Eng. thatch; Sanskrit stagati 'to cover' mostly with different
meanings) (?); Manx shimmey, Alb. shumë
(many); Welsh <gwdff>, Alb. qafë (neck);
Ir. ur, Alb. ri (new); Ir. maith,
Welsh mad, Alb. mirë (good);
On the other hand, there are a few unique Goidelic/Brythonic innovations (as long
as these matches do not result from subsequent mutual borrowings).
OIr. macc, Old Welsh map (son) (IE root, semantically
innovative); OIr. tene, Welsh tân (fire)
(IE root, semantically innovative); OIr. lám, Old Welsh
lau (hand) (akin to Latin palma, Anglo-Saxon folm 'palm',
typically Celtic, but not unique); OIr. carric, Old
Welsh carrec (stone) (IE root, rather semantically innovative, but
also cf. Alb. gurë);
Note
that the percentage of Irish to Welsh may be a little lower than actual, because
of the greater than usual number of dialectical (?) synonyms within the Welsh
dataset, which are herein calculated as 0.5-0.3 per lexeme, so we might expect
the corrected figure for the Irish/Welsh relatedness to be a little higher (about
57% -?). Accordingly,
this predicts two waves of migration into the British Isles, with Proto-Goidelic
being the first to enter, and the Brythonic subgroup being a result of relatively
recent migration from the Continent. Proto-Brythonic and Proto-Goidelic must have
separated a long time ago somewhere in northwestern or central continental Europe. As
to Italo-Celtic, the current study is not sufficiently detailed and elaborate
neither to completely exclude, nor to corroborate the possibility of Italo-Celtic
grouping; rather we see it as a possible, but unlikely, and in this case very
short-lived state within the European Centum branch. There seem to be no specific
Italo-Celtic shared innovations in the 46-list, except for the typical Celt. ni
: Latin nos (we), which is also attested in other Indo-European groups,
and is not unique. Hypothesis:
Italic may
be related to Hellenic These
two groups seem to have very much in common (herein ~69%), which should not be
surprising, since the close proximity of Attic Greek to Latin was well-known since
the antiquity. Consider the following phonological and semantic similarities from
the 46-table: (Latin and Greek are transcribed phonetically):
Latin duo;
Gr. dú:o (but Welsh dau; Old English
tva; Pruss. dwai. Lith. du); Latin kwattuor; Myc.
Gr. kwetoro (but OIr. cethir; Alb katër;
Old English feower; Lith. keturì); Latin ego;
Gr. egó: (but Old English ik; Toch. ñäs;
Welsh i; Alb.unë); Latin pes;
Gr pú:s (foot); Latin noks; Gr.
nü:ks (but Welsh nos; Alb. natë; Old English
niht); Latin humus (ground); Gr. *xamos,
xamai 'on the ground'; Latin folium; Gr.
fúllon (leaf); Latin frater; Gr.
phra:ter (brother) (but br- in most other IE languages, Sanskrit
"bhra:taH"); Latin
lupus; Gr. lükos (wolf) (a similar loss of initial
v-, which was rather unique among other IE groups); Latin petra;
Gr. pétros (stone) (as opposed to I. cloch; Alb.
gur; Toch. B kärweñe; Old English sta:n); Latin
domus; Gr. dómos (home); Latin rivus;
Gr. rheos (river); In
all of the above instances we observe close phonological and semantic proximity
that can be explained by assuming a genetic unity of Italic and Hellenic languages.
This is easily explained from the geographical perspective by considering the
fact that one of the few feasible passages to the Italian peninsula goes through
the southern Balkans and northern Greece, therefore the only geographically realistic
way for Proto-Italic to form was by its separation from Proto-Hellenic at some
point in time. However, the lexicostatistical proximity of only about
45% between Modern Greek and modern Romance languages (such as Spanish) as compared
to an average of about 40% among other modern European Centum languages indicates
that the Italo-Hellenic proto-state was rather short-lived and unstable.
Hypothesis: Germanic may be related to Tocharian
An
even more interesting find may be a possible proximity of Proto-Germanic to Proto-Tocharian
(Old. Eng : Toch B ~ 65%; German : Toch B ~ 59%). This observation deserves further
investigation: Old
English wæter; Toch. A wär; Toch. B
wer < *wat'er (?) (but Ir. uisce; Welsh
dwr; Gr. hüdo:r;
Lith. vanduõ); Goth. swistar; Toch.
A s'ar; Toch. B s'er <*set'er (sister) (the same loss
of aspirated intervocalic -t'-); Goth. weis; Toch. B
wes (we) (also, at least Lith. vedu 'we two' and OCS ve
'we two', but not as 'we' in the phonological form of *weis, and not in the European
Centum languages); Goth. hairto; Toch. B arañce
<*harnte (?) (heart); Goth. waurts; German Wurzel;
Toch. A witsako (root); German Blatt; Toch.
A pält; Toch. B pilta (leaf, blade) (this root is
also persistent in the Indo-Iranian branch); German Stamm 'stem';
Toch. A s.tám; Toch. B stám (tree);
Goth. waurms; Toch. A wal (worm) (but also Latin
vermis (with a full ending); Gr. rhomos; I. cruimh, Alb.
krimb, Pruss. <Girmis>); Consider
also the strong aspiration in t'- which lead to a transformation t'
> ts (not necessarily due to palatalization as normally explained): Toch.
B mácer (mother); Toch. B pacer (father); Toch.
B tkácer (daugther); Toch. B. kuce (who); Tocharian
k- finds explanation as a strongly aspirated t' > tk' > k' >
k (Apparently, the digraph <tk> as preserved in Toch. A tkam
(earth); Toch. A ckácer; Toch. B tkácer
marks the result of this aspiration.); Here
is a short lemma that attempts to prove a regular correspondence between Proto-Tocharian
*ka- and *tV- in the European Centum languages: Toch.
A kam; Toch. B keme, hence Proto-Toch. *kam < *tham (tooth);
Toch. A kantu; Toch. B kantwo, hence Proto-Toch. *kantwo
< *thank'wo (with a metathesis) (tongue); Toch. A kom.; Toch.
B kaum., hence Proto-Toch. *kaum < *thaum (day, sun); Toch.
A tkam.; Toch. B kem., hence Proto-Toch. *kam (tkam) < *tham
(earth, cf. Latin tellus, OIr. ti:r); Toch. A karke;
Toch B kara:k, hence Proto-Toch. *karak < *tharak, *tharakh, *tharah
(tree branch); Toch. A kayurs'; Toch. B kaurs'e, hence Proto-Toch.
*kaurs'e < *thaurse (bull, cf. Taurus); Toch. A kälyt-är;Toch.
B kalt-är, hence Proto-Toch. *kalt-ar < *thalt-ar (s-tand);
(?) [Yet, in some
other cases we have k < *k: Toch. A känt; Toch. B
kante, hence Proto-Toch. *kente (hundred); Toch. A kanwem.;
Toch. B keni, hence Proto-Toch. *ken- (knees (du.));] The
former process is possible if Proto-Tocharian stops where heavily aspirated, hence
*ta > *tha > *hha > *kha > *ka before an open /a/ when the
dentals were undergoing an allophonic lention. The metathesis in *tankwo
occurred precisely under the impact of aspiration, because both *th and
kh* were pronounced in a rather similar way at some point, more or less
like *hhanhhwo The
Tocharian aspiration reminds of the Grimm's law and the aspiration in the West
Germanic languages. Some
of the Grimm's law seems to be already in progress in early Proto-Tocharian, since
we have *k > *h > 0 in: Got.
dauhtar; Toch. B tkácer
(but Gr. thügáte:r) Got. hairto; Toch. B.
arañce <*harnte (?) Other
examples of Germano-Tocharian analogies might include: Toch.
A kumn-äs'; Toch. B känm-as's'äm; German
kommen (come) [cf. Skt. gamati "he goes," Avestan
jamaiti "goes," Lith. gemu "to be born," Gk. bainein
"to go, walk, step," Latin venire "to come"), which
do not have the same semantic and phonological form as in German and Tocharian]
Toch B s'ayye; German Schaf (sheep) [no known cognates
outside Germanic. The more usual IE word for the animal was *ewe.] It
should not be particularly surprising that the Proto-Tocharians wandered as far
as the Taklamakan desert remember that we have a massive Gothic migration
to the Crimean peninsula and the rest of the Europe about two thousands of years
later. The Indo-Europeans used horses, whereas the vast Ponto-Caspian and Central
Asian steppes allowed for distant migrations across Eurasia.
I do not insist on Proto-Tocharian / Proto-Germanic unity; at this level that's
just a tentative hypothesis, which follows from the data under consideration,
but which is rather poorly demonstrated herein. The
Balto-Slavic unity is well-proven The
close proximity of Baltic and Slavic (herein 65%) languages is well-supported
by many other studies (Dyen (1991), Ringe (2005)), including some articles you
can find at this site. You can also easily see a number of shared Balto-Slavic
lexical innovations in the present 46-lexeme list: Pruss.
ranko; Lith ranká; Latv. roka; OCS
ro~ka [nasal]; Russ. ruká (hand, arm) Pruss.
nage; OCS noga (foot, leg) Pruss. zwaigstan
(or rather: swaigstan) 'the shining'; Lith. zhvaigzhde; Latv.
zvaigzne; OCS zvezdá (star) Pruss. zirgis
"stallion"; Lith. zhirgas "horse"; Latv.
zirgs "horse"; Russ. zhere-béts "stallion" The
close genetic proximity of both groups is evident to anyone familiar with any
two Baltic and Slavic languages. It doesn't really take any research. Some selected
words and phrases may not even require translation, and some meanings can even
be figured out with some effort and the knowledge of regular correspondences.
[Cf. as anecdotal evidence (phonetical transcription): Kaip ash buváu
ministru "How I was a Minister" (a book by Zinkevicius),
but a possible Russian translation Kak ya (also OCS azê and
Bulgarian az) byl (also: byvál) minístrom; or Lith.
Líye litús "Rains (pours) the rain" vs. Russ. Lyót
líven' "Pours the shower/rain"] However, this close relationship
should not be oversimplified or overestimated, neither it means that Lithuanian
or Latvian are directly readable to the speakers of Slavic languages and vice
versa. In my personal humble opinion, reasons against Balto-Slavic genetic
grouping can only come from western researches either unfamiliar with any of these
languages, or nationalistically-minded Balts who view any relation to Slavic as
insulting. This long-standing dispute should finally be closed down. On
the other hand, the difference between modern Lithuanian and Latvian seems rather
pronounced. According to a lexicostatistical study by Girdenis&Mazhiulis (1994)
we have 68% for the Lithuanian-Latvian pair, and 70% for the Russian -Macedonian
pair, the two most lexicostatistically distant Slavic languages, whereas the average
inter-Slavic lexicostatistical distance normally oscillates circa 75%. We should
also take into consideration possible historical contacts between Proto-Latgalian-Latvian
and Proto-Lithuanian-Samogitan throughout their history, which would further decrease
the figure for the Baltic languages to about ~62% because of possible mutual
borrowings. This leads us to the conclusion that the Baltic group has many internal
differences and is generally a little older than the Slavic group. See [Girdenis,
Mazhiulis (1994)]. As
to the Balto-Slavic lexicostatistical relatedness in that research, we have an
average of 46% for Lithuanian vs. Slavic and an average of 42% for Latvian vs.
Slavic, or ~44% on average. This yields a 6244 ~18% difference
between the hypothetical lexicostatistical depth of the Baltic and Slavic groups.
| Girdenis,
Mazhiulis (Swadesh-200, cognates) (1994) | | | Lithuanian | Latvian | Old
Prussian* | Russian |
| Lithuanian | | 68% | (49%) | 47% |
| Latvian | | | (44%) | 45% |
| Old Prussian | | | | (41%) |
(*Their data for
Prussian are probably unreliable, because there's not enough attested material
to fill in a Swadesh-200) I
have also conducted my own lexicostatistical study using an unconventional list
of wild flora/fauna (81 lexemes) which is supposed to be much less
affected by loanwords due to presumably high stability of this type of basic vocabulary
(see Balto-Slavic Lexicostatistics
(in Russian)). This flora/fauna list yielded the following percentages:
| Wild
fauna/flora, 81 lexemes, cognates (2008) | | | Lithuanian | Latvian | Old
Prussian | Russian |
| Lithuanian | | 64% | 67% | 48% |
| Latvian | | | 58% | 46% |
| Old Prussian | | | | *51% |
(*The Prussian
percentage should be decreased by a small number, because of the 600-year difference
between the attested Old Prussian and a hypothetical "Modern Prussian",
but that wouldn't affect the final outcome to any sufficient extent.)
Incidentally, that nearly coincides with the Mazhiulis' data (again, different
lexical lists may normally coincide by absolute figures only by accident), hence
we have (64 + 67 + 58 /3) = 63% for the average relatedness among the Baltic
languages, and (48 + 46 + 51/3) = 48% for the average relatedness of Russian
to Baltic. Again, we have a ~15% difference between the hypothetical glottochronological
age of the Baltic and Slavic groups in this study. Finally,
the above figures partly corroborate the calculations of the present preliminary
study (the 46-list):
| The 46-list;
cognates with phonological similarity (2008) | | | Lithuanian | Latvian | Old
Prussian | Russian |
| Lithuanian | | ~78% | ~76% | ~67% |
| Latvian | | | ~68% | ~66% |
| Old Prussian | | | | ~64% |
Herein, we have
[78 + 76 + 68)/3] - [(67 + 66 + 64)/3] ~ 8% difference between Baltic and
Slavic (Russian). The smaller difference may be attributed to a much higher stability
of the 46-list and a different method of counting. Consequently,
the difference within the Baltic languages is a little greater than normally assumed,
whereas the difference between Slavic and Baltic is less than normally assumed,
which makes Balto-Slavic a statistically reasonable grouping, although it is true
that the Slavic languages cannot be directly included into the Baltic group as
a subgroup, that would be going too far, rather they seem to have separated much
earlier than most Baltic languages.
Hypothesis:
Is Balto-Slavic related to Germanic?
This current lexicostatistical conclusion of modern Baltic and Slavic
being related to modern Germanic to about 50% contradicts the fact of pronounced
satemization in Balto-Slavic. Herein, we have BS/Germanic ~ 50%, and BS/Indo-Iranian
~ 35%, which could be due to a lexicostatistical error. The close match may also
be attributed to the archaism of the both groups. Neither are there any clear-cut
innovations shared by Balto-Slavic and Proto-Germanic in the 46-list. More extensive
research on the subject is needed to support or discard either of the hypotheses.
West
Iranian First
of all, it should be noted that the traditional four-corner scheme of Iranian
languages (Northwest, Southwest, Northeast, Southeast Iranian) hardly holds true
in the perspective of contemporary accurate lexicostatistical studies. The Iranian
languages are an extremely complex branch of IE languages with a glottochronological
and historical depth of at least 3000 years, similar in this respect to the Balto-Slavic
branch, but more numerous and extending over a highly differentiated geographic
territory. Most
West Iranian subgroups are closely related (cf. Modern Persian/Kurdish ~ 80%).
The fact of the close proximity of West Iranian languages can easily be explained
by reminding that the West Iranian languages are in many ways similar to Romance
they result from the expansion of the Median and Persian Empire since c.
800-600 BC. Although the Median Empire and the unattested Median language is sometimes
linked to Kurdish (without any clear arguments), the present research rather shows
that Kurdish is much closer to Persian. However, there is a longer lexicostatistical
distance between Modern Persian and the "Northwest Iranian" languages,
such as Zazaki (Dimli), the lesser Northwest Iranian languages (such as Harzani,
Semnani, Gorani, Kermanshahi, Sangisari), probably Parthian and Mazandarani (Zazaki/M.
Persian ~70-75%). [The results for the lesser languages have been inferred from
the consideration of phonological transitions in 1-10 numbers.]
Kurdish is closely related to Persian, Zazaki is not This can
be shown at least in the following way: (1)
wolf: Kurdish gur, Pahlavi gurg, Balochi gurkh, Persian
gorg, but Avestan varkha, Old Persian varka, Zazaki verk.
Herein, we have a very typical post-Old-Persian innovation with the word-initial
g-; (2) three: Avestan thri > se in most West
Iranian, chi in Old Persian, but Zazaki hire; (3) I:
the loss of the historical pronoun azem in many West Iranian languages
with its substitution by man, but the retention of ez in Zazaki;
(4) year: Kurdish sal, Balochi so:l, Persian
sal, but Avestan sared, Old Persian ßard, but Zazaki
serre with -r-; (5) heart: Kurdish. dil, Balochi
dil, Persian del, but Zazaki zerre; It
can be seen that Zazaki is phonologically very different from Perisan. Consequently,
the linguitsic legend of Kurdish being related to a semi-legendary unattested
language of Media cannot hold true, although this may be true of Zazaki, Mazandarani
and some other Northwest Iranian languages which evidently exhibit many differences
from the languages that descend from the Persian Empire. Avestan
Avestan also demonstrates close proximity to Proto-West-Iranian.
Just think that the Zoroastrian religion would never gain much acceptance in Persia
if it were propagated in a language radically different or completely mutually
unintelligible with Old Persian. A more important argument for the close link
between Avestan and Proto-West-Iranian is the lack of the East Iranian lenition
in Avestan: it does have some of it, but not enough. Cf. Pr. bäradär,
Av. brâta, but Pashto wror; Shughni verod
(brother) Pr. xahär; Av. xvaharha,
but Pashto khor; Shughni yakh (sister)
Pr. doxtär; Av. duxdhar; but Pashto lûr;
Bactrian logda (daughter) Pr. atesh; Av âtar-sh;
Pashto ol; Shughni yâc (fire), etc
We can see in these examples, that the East Iranian languages have undergone considerable
changes and exhibit little phono- or lexicostatistical proximity to Avestan. Lexicostatistically,
we have Avestan/Modern Persian ~ 78%, but Avestan/East Iranian ~60% on average
[Avestan/Shughni ~59%, Avestan/Ossetic ~59%, Avestan/Pashto ~66%, Avestan/Wakhi
~56%]. Consequently, Avestan may be a good candidate for Proto-West-Iranian.
East Iranian
This is probably the most complex and most controversial group among the Indo-European
languages. Having been studied only as late as the 19-20th century, it remains
largely unknown to many Indo-Europeanists in the west. For years, researches have
tacitly assumed that there should be nothing in Iranian which can't be found in
Avestan ignoring the many bizarre peculiarities of this family. It was, for instance,
poorly represented in Dyen's lexicostatistical research. The group's textbook
classification (Northeast to Southeast Iranian) is completely unacceptable and
is hardly supported by any linguistic arguments at all. In fact, a closer look
reveals a complicated branch with many different sprouts. The present lexicostatistical
study, for instance, shows that the actual difference between Russian and Lithuanian
might, in fact, be less than between Wakhi and Shughni, both of which are believed
to be "Pamir", or sometimes even called "Pamir dialects".
The
group average lexicostatistical depth of about 60% indicates that the East Iranian
languages have been hiding around the Pamir and Hindu-Kush Mountains probably
since about 1000-1500 BC, branching off into several subgroups shortly after the
period of separation of the whole Indo-Iranian supergroup. As a result, they can
be regarded as complex and probably even a rather independent taxon of Indo-Iranian
languages. The
most obvious feature of the East Iranian languages is a widespread lenition of
consonants (d > ð > l; b > v; k > c, etc.) which, by the way,
might be an early areal, rather than genetic feature. This makes East
Iranian words look a far cry from the "normal" Indo-European languages:
Cf. Ossetic
ærtæ, Shughni aráy (three) Pahsto
lûr, Yidga lughdoh; Ishkashimi udoGd (daughter)
Yd. uxsho; Sanglechi khoar;
Shughni xo:gh (six) Pashto le:wê
[metathesis]; Shughni urj (wolf) The Pamir languages may
form an internal genetic unity, apparently with the three following subbranches:
(1) Yidgha-Munji; (2) Ishkashimi-Zebaki-Sanglechi; (3) Shughni-Rushan-Sarikoli-Yazgulami.
The first two are rather closely related (1)/(2) ~80%, while the third one is
a little more differentiated (1)/(2); (2)/(3) ~ 70% (on average), with Yazgulami
being particularly different. There might also be some speculations on relating
ancient Bactrian (the language of the Kushan Kingdom) to Yidga-Munji, but the
precise lexicostatistical study of Bactrian is absent due to lack of lexical material. Wakhi,
a language located in the Hindu-Kush mountains, just across the ridge from Burushaski,
it is normally thought to be "Pamir", but differes from other Pamir
languages in many respects. It exhibits less East Iranian lenition (cf. trui
"three"; sha:d "six"; ðaGd "daughter"'),
and possesses certain archaic lexemes and innovative phonological formations (suk
"we"; bu "two"; pazuv "heart"; naghd
"night" cf. Av. xshap; naxtu), which demonstrates the
archaism of Wakhi. It had probably separated early on, and has been isolated from
the rest of Iranian languages for a long time. Yagnobi
(Yaghnobi), or Neo-Sogdian spoken only in a few villages in Tadzikistan, is presently
strongly contaminated by Tadzik even in basic vocabulary, which creates many difficulties
in lexicostatistical studies. However, it should be noted that there is no evidence
it is particularly close to Ossetic as it is assumed in the Northeast-to-Southeast
textbook classification. Ormuri
and Parachi have been excluded from the calculations due to insufficient material,
yet there are reasons to believe their separation from other Iranian subgroups
is quite ancient. Ossetic
is one of the most famous offshoots of Proto-East-Iranian that must have separated
quite early on (not later than 700 BC judging from historical assumptions). Among
other features, it is characterized by an extensive metathesis: ærtæ
< *tere (three), ævzhag < *zevag (tongue) ærvad
< *verad (brother) art <*at(e)r (fire), and
further lenitive changes (*p > f): fêd
< *ped (father) fêrt < *putr (son)
The lexicostatistical
relatedness of Modern Persian to East Iranian (62%) is nearly the same or just
slightly greater than among East Iranian languages to each other (58-59%), which
means that all Iranian languages separated from the common Iranian stem almost
simultaneously, and if the East Iranian languages constituted a genetic unity,
it was only for a relatively short period of time. Saka Khotanese
is a historically important and well-attested Iranian language of the Tarim Basin
(Taklamakan Desert) konown since c. 500-700 AD, but almost completely forgotten
in most classical Indo-European studies. It probably has nothing to do with the
ancient Sakas, but the name has stuck and is unlikely to change. For all practical
purposes, we could think of Khotanese (in the south) and Tumshuquese (in the north)
as the "Taklamkan" Iranian languages, not "Sakan", at least
this name would be more self-explanatory. There also existed several other languages
of this branch, although they are poorly attested. The
Khotanese/Avestan lexicostatistical relatedess of ~ 73% corresponds to the glottochronological
separation of about 2000 years prior to the mean dating of Khotanese (600 AD)
and Avestan (600 BC), that is c. 2000 BC. This separation depth matches the average
relatedness of Khotanese / Modern Iranian languages = (66 + 64 + 59 + 60 + 56
+ 62 + 54) / 6 ~ 60%, which should be adjusted by a coefficient of about 0.9 to
correct for the early dating of Khotanese (500-700 AD) thus yielding ~54%, or
again 1800 BC. This
means that Proto-Saka could have separated from other Iranian languages at a very
early stage, probably as early as the period of existence of the Oxus civilization;
therefore it should be regarded as a separate Iranian group, which is also phonologically
corroborated by the lack of East Iranian lenition, and geographically, by the
great distance from the West Iranian languages.
Iranian in general
As
to the Iranian languages in general, they do share many often unique lexical,
semantic and phonological innovations which prove the existence of a rather long
historical period of a common Proto-Iranian state. Avestan chasman;
Khotanese ceima; Persian cheshm; Wakhi cözm;
Yidga. cam; Shugni cem; Ossetic cæsht
(also Sanskrit chakshus.h) (eye) Avestan xshap,
Avestan xshap; Persian
shab; Pashto shpa; Yaghnobi xishap; Ishkashimi
sab, sxab; Shugni shab; Ossetian æxshæv
<*xeshev (but
also archaic Av. naxturu 'nocturnal'; Wakhi naGd)
(night) Avestan raocah 'daylight'; Persian ruz;
Zazaki roje; Pashto wradz; Wakhi rwor;
Sanglechi rusht (day) Avestan asanga 'stone';
Khotanese samga; Persian sang; Yaghnobi sank;
Sanglechi song (stone) Avestan gaosha; Persian
gush, Pashto Gvazh; Yaghnobi Gu:sh; Wakhi
ghish; Shugni ghox; Ossetic x"ush
(ear) Avestan taoxma; Persian tokhm;
Wakhi tuxm murG; Shugni tarmurx (egg)
These lexical items can be seen as typically Iranian, indicating that Proto-Iranian
has existed as a single unity for a time long enough to produce local innovations
even in a short 46-list. Also see a similar Starostin's
dendrogram of Iranian languages, which
tends to confirm the conclusions of the present study as far the tree structure
and lexicostatistical percentages are concerned. However,
it should be noted that Starostin's "recalibrated" glottochronology
often yields too early dates and is probably incorrect. For instance, he provides
(-620) for the separation of West Iranian, whereas we know well from history that
the Median Kingdom was first mentioned in 836 BC, whereas its language is normally
believed to be West Iranian, as a result we have an obvious contradiction.
Armenian
The
position of Armenian is seen herein as highly controversial, and its discussion
has been excluded from the present notes. It may very well be related to Indo-Iranian
languages, not Proto-Greek, as many people assume. Nuristani,
Dardic and Indo-Aryan The Nuristani-Dardic branch (~65%) seems
to be as internally close as Balto-Slavic [although Khowar shows many dissimilarities
from other members of the Dardic group]. The same is true for the mainstream Indo-Aryan
languages (~65%). Kashmiri
does not seem to be Dardic, and was herein included into the mainstream Indo-Aryan
subgroup. Note the considerable difference between Sinhalese and Hindi-Kashmiri
(~55%), which indicates that the Sinhalese-Maldivian subgroup must have been a
very early offshoot. The separation of Proto-Nuristani-Dardic from the
main Indo-Aryan branch seems to have occurred at the depth of 54% which must correspond
to roughly 1800-1600 BC (the archaeological and historical date normally associated
with the "Aryan invasion"). To appreciate the shared Nuristani-Dardo-Indo-Aryan
(or "Indic", for short) phonological transformations and lexical innovations,
consider the following examples from the 46-list: Kati dits;
Kalasha Jhiph; Skr. jihvha:; Hindi jibh;
Sinh. diva (tongue), as opposed to Avestan *hizva:s;
Pers. zabân, Bactrian ezbago; Pashto
zhêba; Yaghnobi zivok; Shugni zev. [jh
>d : z] Kati su; Kalasha súri;
Skr. su:ryaH; Hindi su:rey (sun), as opposed to Avestan
hvar-z; Pers. khurshed, Yaghnobi khur; Shugni
xer; Wakhi yir [s : x] Kalasha hía;
Skr. h'Rdaya; Hindi hridey; Sinh. <hardaya>;
Dhivehi hi-iy (heart), as opposed to Zazaki <zerrî>;
Pashto zrrê; zaru; Shugni zrað;
Ossetic zhærdæ; (but cf. Kati ziri <
an Iranian loanword?) [h : z] Kati ango; Khowar
angar; Kalasha angár; Skr. agniH, àNg'ara;
Hindi a:g (fire), as opposed to Avestan âtar-sh;
Pers. atesh, Ossetic art < *atr; Pashto
or; Yaghnobi ol; Yidgha yur; Shugni yâc.
[-ng- : -t-/0 ?] Kati
kor; Kalasha ka; Skr. karNa; Hindi
kan; Sinh. kana (ear), probably akin to Avestan karana
'side, flank'), as opposed to Avestan gaosha; Persian gush,
Pashto Gvazh; Yaghnobi Gu:sh; Shugni ghox
(semantic innovation).
Kati radur; Kalasha rat; Skr. ra:tra; Hindi
ra:t; Sinh. <raeya> (night), akin to Lith. /ri:ta/
'morning'; Pers. ruz 'day') as opposed to Av. xshap; Pers.
shab; Pashto shpa; Yaghnobi xishap; Wakhi
naGd. (semantic innovation) Again,
as in in the case with Proto-Iranian, from a great number of shared features within
a short word list, we can deduce that Proto-Nuristani-Dardo-Indo-Aryan
(Proto-Indic) has had a prolonged period of separate existence, at least
2000 years long.
Hypothesis: Nuristani are part of (or close to) Dardic
You can see from the examples above that the Nuristani languages (such
as Kati (Kata-viri), Kami (Kam-viri), Wasi) are clearly related to Indic, since
they inherit the same transformations and innovations, and thus cannot be seen
as "intermediate" between Indic and Iranian, as sometimes claimed. They
also seem to share some common phonolgical and semantic formations with the Dardic
languages, and can hardly be viewed as radically separate: (1) Kati
g'u; Khowar g'oG; Kalasha goík (worm);
(2)
Kati uts; Khowar awa; Kalasha a (I)
(as opposed to Skr. asmad; Kashmiri bu, boh';
Hindi me; Lahnda mae; Bengali ami;
Sinh. mama) (probably, an early loss of the second part of *as-mad); (3)
Kati nu; Khowar nan ; Kalasha áya
(mother) (curiously, probably akin to Eng. "nanny" as also to a
similar word in Eastern Iranian languages) (as opposed to Skr. ma:tar,
ma:ta:; Kashmiri moju; Hindi ma:, ma:ta:ji;
Sinh. <mava>). The much too overused objection to children's
words is seen as exaggerated herein: words like "mother, father, nanny"
are quite normal words, they are not easily re-created from scratch each time
in each language.; (4)
Kati sh'üt; Khowar chuti; Kalasha chom
(earth) (as opposed to Skr. mahi:; Kashmiri metsu,
boh'; Hindi mitti 'clay'; Bengali mati;
Gujarati mati; Sinh. <pas>, <poloova> ) (apparently,
akin to East Iranian: Ishkashimi shit; Sarikoli sit;
Yazgulami shat; Ossetic sêdJêt)
However, the shared innovations in question are few and may have formed
independently because of an Iranian adstratum, borrowings or by other means.
Proto-Indo-Iranian
The Proto-Indo-Iranian language existed a long time ago or/and was rather
short-lived. This conclusion may be drawn from the fact that relatively few traces
remain in the 46-list under consideration in modern languages, which could demonstrate
the existence of Proto-Indo-Iranian. The uniquely similar words include: (1)
Av. âf-sh, ap; Pr. âb; Pashto obê;
Yaghnobi op; Wakhi yupk; Kami oa, op;
Skr. a:paH > paniya (?);
(akin to Lith. /upe/ 'river') This ia a semantic innovation, which
was created probably because water was closely associated with rivers in desert
Central Asian regions, hence the semantic transformation "river" >
"water"; it's more likely, however, that this lexeme is only present
in Iranian, whereas its appearance in Indic is recent, cf. Sinh. <vatura>.
(2) Av. bu:mä; Old Pr. bu:mis; Kurdish
bin; Ormuri (Logar) bouma; Kami. b'üm;
Khowar b'um; Skr. bhu:miH; Hindi bhu:mi;
Sinh. bin (earth);
(3) Av. masya;
Pr. mâhi; Skr. ma:tsya; Hindi machhi;
Marathi masa; Gujarati macheli; Sinh. malu
(fish); probably akin to Lith. mesa; Eng. meat
(not in the 46-list); the introduction of this word may indicate that fishery
was an important component of Indo-Iranian subsistence. As
we have seen, the independent changes in Proto-Indic and Proto-Iranian are very
pronounced and they share few common innovations, which indicates that both languages
have existed separately from each other for some considerable amount of time,
and no longer have much in common (~40% in the present study). Glottochronologically,
from the considerations of the present study, they could have separated c.
3000-3500 BC, which is about 1000-1500 years earlier than usually assumed.
That would mean that the Proto-Indo-Iranians entered Central Asia soon after 4000
BC (see Map
of Indo-Iranian Migrations), quickly migrated along the Oxus valley,
reached the Hindu-Kush and Pamir mountains, where the early Proto-Indic language
completely separated by 3000 BC, penetrating the mountain ridges, and staying
there with some internal differentiation until about 1700 BC when the Indo-Aryan
languages finally began to migrate into northern India. Although this is not reflected
in this study, it is also plausible to assume that Indo-Aryan per se had initially
been a subbranch of the Dardic languages that expanded into the Indian subcontinent.
But then, why do
we often hear about the close proximity between Sanskrit and Avestan? The probable
explanation is that the classical "dictionary" Sanskrit" is not
a real language, it is rather a quasi-etymological collection of lexemes which
belong to different Indo-Aryan dialects from different periods; whereas the earliest
Vedic Sanskrit, which has been passed down orally for many generations, is even
more confusing and sometimes not even entirely decipherable. In any case, the
classical Sanskrit cannot be seen as something of a Proto-Indo-Aryan, since it
was basically an artificial conlang created by Panini, and then lexically expanded
over the course of many centuries. Consequently, a casual comparison with Avestan
may produce many synonyms and many obscure parts in Vedic Sanskrit texts which
can be interpreted in different ways and thus provide a superficial impression
of a close relationship between Sanskrit and Avestan. On
the other hand, in the present study, only real languages of the same period can
be compared, which helps to uncover the lack of common lexical background between
Iranian and Indic languages and offsets the Indo-Iranian separation further back
in time. Similar difficulties of finding the common Indo-Iranian proto-state were
also noted in other lexicostatistical studies of modern languages, first by Dyen,
Kruskal, Black (1992) who complained about the "absence in the present
classification of an Indoiranian group" and then by Ringe et al. (2005)
Nevertheless, this rather significant question stands to be further investigated
in a more detailed research. Anatolian The
current study confirms the early separation of Hittite. This is evident from the
following considerations. The results of comparison of Hittite to: Latin (52%)
(attested c. 100 BC), Attic Greek (51%)(400 BC), Avestan (50%) (600 BC), Sanskrit
(~50%) (400BC) render nearly equal results, which means that Hittite seems to
be equidistant from other Indo-European groups. Since
Hittite is dated to c. 1600 BC, there would be even fewer matches if it had existed
for 1300 ys. longer to see Latin and Avestan, therefore this average figure of
~50% should be further reduced to about 45% of relatedness to most Indo-European
groups. Now this result shows more differentiation than han in the normal Indo-European
pairs: Latin/Greek (67%), Greek/Avestan (57%), Latin/Sanskrit (~57%). Glottochronologically,
that figure would translate to about 5000 years before the Latin/Greek/Avestan/Sanskrit
separation (c. 300 BC), or circa 5300 BC (see below). Therefore,
we repeat the conclusion that Anatolian group should be regarded separately from
the mainstream Indo-European languages, which supports the hypothesis of Indo-Hittite
(Indo-Anatolian).
Attempting
to date Proto-Indo-European One
of the common reasons for the criticism of glottochronology is the alleged insufficient
lexicostatistical distance between Modern Icelandic, Modern Armenian, and Modern
Georgian and their respective old languages. However, the critics of glottochronology
seem to ignore the law of large numbers, which states that even if some of the
languages might deviate considerably from the mean in their phono- and lexicostatistical
behavior, the arithmetic average over a large number of languages would be relatively
stable and most likely correct. On the other hand, the probability of running
into languages with considerable deviation from the mean would be rather low,
while, in many cases, the abnormal behavior of such deviant languages may be explained
and even consistently predicted using various ad-hoc assumptions, such as geographic
and linguistic isolation on a distant island (as in the case with Icelandic) or
in the mountains (as with Armenian, and Georgian). However,
we will try not to overuse any ad-hoc assumptions herein. The law of large numbers
would be just enough to establish the temporal position of PIE. Here is what we
can do. We can (1) calculate the mean average percentage for all of the Indo-Aryan
languages, (2) recalibrate the rest of the list using the obtained lexicostatistical
depth set to 1600 BC, the archaeologically attested date of the Indo-Aryan invasion
into India (3) calculate the mean value for all of the Indo-European groups, (3)
and finally convert that number into an approximate date in years using the aforementioned
calibration date. The
mean percentage of separation among Nuristani-Dardic (excluding the unreasonably
deviating Khowar), Sinhalese, and Hindi-Kashmiri seems to converge to an average
depth of about 56% [(54 + 57 + 50 + 60 + 62 + 50 + 64 + 59 + 52) / 9 = 56], which
should correspond to circa 1600 BC judging from the archaeological and historical
record (also see The
Map of Indo-European Migration). Hence,
from the logarithmic glottochronological formula, we have: 0.56 = x ^ 3.6
x = 0.56 ^ 0.28 = 0.85 After
some calculations, that would produce the following calibrated glottochronological
row: | 2000
AD | 1000
AD | 0 | 1000
BC | 2000
BC | 3000
BC | 4000
BC | 5000
BC | | 100% | 85% | 72% | 61% | 52% | 44% | 38% | 32% |
This table functon
seems to be more or less consistent with the following historically attested facts
and plausible assumptions: (1)
(very approximately) with the attribution of Proto-Celtic (Irish/Welsh ~ 52% or
57%, as corrected for synonyms -- see above) to c. 1500 BC and the Early
Bronze Urnfield culture (1300-750 BC);
(2) with the separation of Hellenic from Italic occurring before 1600-1900
BC when the Proto-Greek tribes must have entered Greece. Glottochronologically,
the Attic Greek/Latin relatedness (69%) corresponds to about 2300 years before
200 BC (a mean value between the approximate dates of the Greek and Latin languages),
thus yielding c. 2500 BC for the late Helleno-Italic proto-state. (3)
with the Baltic (~75%) expanding around 200 AD, just a little earlier than
late Proto-Slavic (c. 450-500 AD), because the lexicostatistical distance between
Lithuanian and Latvian in a more accurate lexicostatistical study by Mazhulis
(1994) using Swadesh-200 is just slightly greater than among the Slavic languages
to each other, whereas the period of Proto-Slavic split seems to be historically
datable to 400-500 AD; therefore, 0-200 AD seems to be a plausible value for the
separation date of Lithuanian and Latvian-Latgalian. (4)
with the likely separation of Proto-Zazaki from Persian (70-75%) soon after the
end of the Old Persian period (300-500 BC); (5)
with Ossetic separating from other East Iranian at 60% (~1100 BC). This dating
looks right, because the Scythian languages are attested in the Caucasus Mountains
just circa 800 BC, and their migration from the Pamirs must have been relatively
quick (because of the horse-drawn carts) and occurring at an early stage. (6)
with the existence of the BMAC civilization along the Oxus river during 1800-2300
BC, which should probably be attributed to the Proto-Iranian state, apparently
located along the Oxus as well (see The
map of Indo-Iranian Migration). According to the present calculations,
the era of Proto-Iranian would roughly correspond to 60-40% thus embracing the
period from 1100 to 3000 BC, which includes the period of the BMAC as a
subset. (7) with
the diversification of West Iranian languages (70%, 200 BC) after the fall of
the Persian Empire by 330 BC. Calculating
the approximate upper date for late PIE:
Now
that we have obtained the glottochronological table function, we can use the figures
for the Greek/Avestan (57%), and Latin/Sanskrit (~57%) relatedness to place the
upper limit for Proto-Indo-European at the level of 3500 years before the average
dating of Avestan (600 BC), Sanskrit (400 BC), Latin (100 BC), and Greek (400
BC), or circa 3900 BC. We can also obtain a similar number starting
from modern languages: Celtic / Indo-Aryan = (38 + 39 + 42) / 3 ~ 39%
Celti / Balto-Slavic = (32 + 32 + 38 + 37 + 44 + 35) ~ 36% Balto-Salvic /
Indo-Aryan = (38 + 36 + 35 + 32) / 4 ~ 35% Balto-Slavi / Iranian = (39 + 38
+ 28 + 35) / 4 ~ 35% European / Iranian = (39 + 37 + 30 + 34) / 4 ~ 35%
European / Indo-Aryan (29 + 34 + 39) / 3 ~ 34 %
PIE ~ 36-35% or circa 4400 BC The
conclusion is that PIE must have separated into early Indo-European dialects by
circa 4100 BC, which is in rather good correspondence with Gimbutas' theory.
Does any of this agree with other models?
Does any of this agree with other researchers' models? Sometimes, it does.
See
Vaclav
Blazhek, On the internal classification of Indo-European languages: survey (2005)
(1) Eric Hamp (1990) We have some essential
agreement with non-lexicostatical model by Eric Hamp (1990), who based his classification
on specific isoglosses in phonology, morphology and lexicon. For instance, he
also tends to place Balto-Slavic in the same group with Centum, a purely lexicostatistical
possible conclusion in this work. He also agrees that Thracian is an early Balto-Slavic
offshoot. He seems to misplace Greek though, because of its alleged proximity
to Armenian (a question I have not addressed herein). Otherwise, his conclusions
are rather traditional. (2) Starostin (2004) There is some
interesting agreement with Starostin's glottochronological study (2004). Note
that the counting and calibration methods in this lexicostatistical study
were completely different. Starostin has: -4600 [my -5300] for the Anatolian
separation; -3800-3300 [my -4100] for the mainstream Indo-European languages
separation; -1200 [my -700] for Balto-Slavic; -80 [my -200] for Latvian-Lithuanian;
-1000 [my -1900] for Brythonic-Goidelic; -250 [my -700] for late Proto-Indo-Aryan;
-1200-700 [my -1100] for late Proto-Iranian; +180 [my -200] for Shugni-Munji(Yidgha)-Ishkashimi
(he also found the early separation of Wakhi (-500), which surprised me as well;
and correctly identified the long separation of Ormuri-Parachi, etc, see The
dendrogram of Iranian languages); +300 [my -100] for late West
Iranian; -1100 [my -1700] for late Proto-Dardic (he also noticed the early
separation of Khowar and Kalasha); etc. Any of which is not too far from the
figures in the present study. At least, we have some basic, fundamental agreement
here. You can also notice that Starostin has a smaller offset value, so it is
basically a matter of calibration (whereas my calibration method was very rough
and approximate in this work, so I don't even insist on it it's not even
the aim of this work to elaborate on a correct glottochronological calibration,
because I was mostly interested in percentage values and interal relatedness of
varous subbranches). The rest of Starostin's cladistics seems to be skewed,
apparently because he relied too much on statistical calculations in short word
lists, which are not always sufficiently accurate to produce an error margin small
enough for building a correct dendrogram, when the separation times are much too
close (a common problem in statistical phylogeny). To avoid this common error,
I simply put an honest I-don't-know and relied on classical conclusions and rough
approximations, whenever I felt there may be something wrong with the statistical
side. On the other hand, some of his dates may in fact be more accurate, because
I used a very small lexical base for just a small number of languages.
2008-2012
|