1st October 2018
by Alberto

Ancient DNA and Linguistics: an introduction

The ancient DNA (aDNA) era has been a revolution (still in progress) in our understanding of prehistory, and one of the implications of it has been the search for the origin of languages (mostly the origin of IE languages, which has been a central subject in many aDNA publications). However, I have the feeling that linguists themselves are not very actively taking part or advantage of the wealth of new data that has come to light so far and continues to come. Here I’ll outline some of the basics of how the available data can already help linguistic research and point to a few of the future developments to look forward too.

Language families, mutation rates, divergence, convergence… The lesson from genetics

Up until now, linguists have mostly relied the the analysis of languages themselves in their studies of language families that go back into prehistory. Not that they have ignored the archaeological data, but admittedly, archaeology could not provide the level of information about ancient populations that aDNA can and so their reliance of archaeological data was quite generic, without it imposing any serious constraints into their linguistic research. Therefor, language families and the family trees have been mostly based on linguistic data alone, including statistical analysis with computer programs relying on mutation rates (to a degree). So I’d like to debate a bit about this in the light of aDNA to see how these things can correlate.

The first lesson to learn comes from the way genetics have evolved in Eurasia, since to some degree languages develop in a similar way. We now have a decent idea of how AMH have evolved genetically since their main Out of Africa event to the present. Very roughly, we’d have two clearly different phases:

The Upper Paleolithic one, where populations were venturing into different area in Eurasia previously uninhabited by AMH, where divergence between these populations due to mutation rates/drift and lack of contact in such a vast area is what lead the genetic development into several “very” (in the Eurasian diversity context) divergent branches that could for the most part be reconstructed in a tree-like structure.
The second phase would be the Holocene, in which the improvement of the weather conditions and more importantly the transition to food production (agriculture) allowed for population growth and expansion. In this second phase the mutation rates and drift become an irrelevant part of genetic evolution, and instead of divergence we shift to a convergence dynamics driven by population movements and contacts (admixture either through specific migrations or just through what is sometimes -not very intuitively- called Isolation-by-Distance). Here there’s no valid tree-like structure to explain the genetic evolution. You have to track the specific population movements, their geographical location and contacts, and disentangle all the processes that led to modern populations being a big mix of UP ones. Any West Eurasian PCA will clearly show the UP/Mesolithic samples in the outside and the modern ones in the inside, surrounded by the former.

When it comes to languages we have to take these patterns into account, because they are the main driving forces in language evolution, just as they are in population genetics (and I’ll stress that this is something new in population genetics, discovered only in the last 4-5 years with the advent of ancient DNA, that forced us to throw away most of the previous population genetic beliefs – a similar revolution is awaiting for linguists when they start to catch up with all this new information).

Nostratic?

The Nostratic hypothesis is a controversial one that postulates a macro family of languages all sharing a common origin and then diverging from each other with time. Obviously not meaning a common origin back to the main Out of Africa event (which yes, it’s likely the common origin of all Eurasian languages, but that’s way too deep in time. We even have the much more recent main “Into America” event, some 17Kya with a small population (last time I checked it was estimated at around 70 individuals) that obviously spoke one single language before expanding throughout the new continent(s) and even then it clearly doesn’t work, since there are many dozens of language families that cannot be related to each other even though we know they once were). Nostratic (at least in its modern conception, for which I’ll follow Allan Bomhard as the main proponent of the hypothesis), refers to languages of West Eurasian origin going back to the early Holocene and expanding probably with agriculture. Let’s start with this image:

If we look for example at the Eurasiatic language subfamily, we find it somewhere around 6000 BCE in Southern Central Asia and expanding from there to the NE (Altaic), NW (Uralic) and West (IE). It is indeed an interesting proposal from a theoretical point of view. But how does it stand in the new reality brought by aDNA?

First, let’s make if clear that languages which are genetically related mean that a parent language existed at some time, in some place, spoken by real people. And that the dispersal of the people (or the languages alone, without explicit migration, but by cultural diffusion) is what made them split into different languages.

So who where this proto-Eurasiatic speakers? Do we have any evidence of them being in that place at that time and expanding in those directions? For anyone who has followed the ancient DNA developments the answer is very clear: No. Such a population didn’t exist in that place at that time expanding in those directions.

I do understand that Bomhard’s proposal is just a proposal about the possible place (not so much about the time, which should be more accurate), and that his probably not strongly tied to such geographical location. But the problem is not the specific geographical location. Trying to find any other that is compatible with what we already know will yield the same result: No, it didn’t happen. It’s already difficult enough to fin a population that spoke PIE, so try to find a common ancestor of that population and the ones who would become proto-Uralic and proto-Altaic, some 2000 years earlier than whatever PIE population one postulates.

What does this mean? Simple: that such languages are not genetically related. They have different origins.

So how do we explain their similarities? THAT is a good question, and one that needs new answers. If (pre-)Altaic ultimately has an origin in, let’s say, East Eurasia (North China?) and IE (and most likely Uralic) one in West Eurasia, why did Altaic languages (and specially Turkic) come to resemble those West Eurasian ones?

Generally speaking, 2 languages can have beyond accidental similarities either by a genetic relationship or by convergence (isn’t this what happens too in population genetics?). The mechanisms of convergence are various, and more or less well known, but unfortunately not well weighted and mostly considered as “add-ons” and not a main driving force in language evolution. This is an area that needs a real breakthrough if linguistics are to catch up with reality.

If someone is jumping to the conclusion that I’m proposing that languages are basically creoles, no, that’s not what I’m proposing. Creoles are a different phenomenon that can be seen in some communities for example in North America where people mix Spanish and English while speaking, even in the same sentence. A language like English, on the other hand, is not a creole. It’s not a mix of Germanic and Latin/Romance. It’s a Germanic language, no doubt, but with strong Romance influence (Nordic too, but that’s Germanic). But what happens when we go deeper in time and influence starts to permeate some of the core vocabulary of each language? Is it still as easy to tell what is what? No, linguistic reality shows us that many times the borders get too blurred to really know (just like genetically related languages -like in America- simply become to different to be considered genetically related anymore).

And while talking about English (which is a simple example for everyone reading this), let’s take a look at it too. How did it become mutually unintelligible with Continental and Nordic Germanic and how long did it take? As for the second question it depends on how mutually intelligible Old English (7th to 11th centuries?) might have been with other Germanic speakers of the time (either -old- Frisian or Norse). If it indeed was still intelligible, then the process happened in the subsequent centuries of Middle English (12th to 14th-15th?). Whatever the case, that’s quite a fast process, and it clearly didn’t happen because of isolation and mutation rates. It happened due to strong influences from other languages with which it came in contact. Compare this to the quite exceptional case of Icelandic, one of the few languages that we can say that evolved in the “old” (Upper Paleolithic) way, by becoming geographically isolated with limited contacts with the parent language after splitting, being mutation rates its main driving force of evolution (unless I’m missing something, which could be, since I’m not well acquainted with Icelandic history).

Another interesting example could be the evolution of Romance languages from Latin. This is a question that has been disputed for probably centuries by now. The basic problem comes down to how can it be explained that Romance languages have features in common that were not present in Latin (prepositions and articles instead of the Latin endings, loos of neutral gender…). Linguists have tried to answer this with different theories, starting by the most simple one:

The changes happened prior to the expansion of the language. Yes, simple as this explanation is, it could never stand scrutiny. Not only it would require that throughout all the Roman period Latin was a dead language only used in written documents (and not literally ones only, but administrative ones, with many people having to study a difficult language for them to be able to write it), but there’s hardly any evidence of such language (proto-Romance) in any written form of any kind (well, there are small things written about mistakes that commoners made that resemble later Romance languages and the like, but not to enter into long details about this let’s just say this disproves rather than proves the existence of such language). Moreover, we have Romanian, which does have some of the Latin features lost in the other Romance languages (like neutral gender).
The substrates theory: basically arguing that the different substrates in the territories in which Latin expanded would explain the differences with Latin. But again, this never worked because we have a fairly good idea of the kind of languages spoken in those territories (Celtic ones, for the most part, and other non-IE ones) and they neither explain the changes nor do they correlate with Romance languages in any significant way.
The superstrate theory: this one tries to explain the evolution of Romance due to the different Germanic tribes entering different parts of the western Roman empire. I don’t think this one deserves further comment.
Others: like the role of the Christian church in its different territories as a driving force of the language changes/preservation and other exotic ones that I don’t even remember.

Doesn’t all this sound like someone is missing something that might be important?

Practical consequences?

This is not just theory, and to an extent not very new one, it’s the practical consequences of it that really matter. Many of these I will write about throughout future posts (this is why I called this an introduction), and some tidbits I’ve mentioned in past ones, like here about the Scythians: Now that we know the place and time of the Scythian genesis (ca. 1600-1500 in Central Asia) and their cultural and genetic relationships (mostly with BMAC, with some Siberian), we know with a high degree of certainty that at that time and place their language had to be a very early form of Indo-Iranian, that even if in the Iranian branch already (not very clear), it had to be closer to Sanskrit that to any other known language (including Avestan). We also know that they migrated west and replaced/assimilated the Srubnaya culture, who didn’t have any direct descendants (Sarmatians were just Scythians who moved west carrying their language). So this has specific consequences for the analysis of surrounding languages, chiefly Uralic ones and Balto-Slavic ones (regarding this last one, I already explained there how knowing the place and time of putative split, a genetic relationship with Indo-Iranian is inconsistent with the data, but more importantly is the fact that around 1500 years of early Indo-Iranian spoken throughout the steppe, in the neighbourhood of the people that would become Balto-Slavic cannot just be ignored in any analysis).

Similar thing with Uralic languages. Given not only the extreme difficulty of a genetic relationship with PIE, but also the difficulty of contemporary and contiguous homelands influencing each other (or at least PIE -> PU), it’s mostly later contacts that take the main role, and with whom they could have interacted and the language spoken by them is a fundamental part. In due time, when the upcoming aDNA is released, I’ll try to devote a post to Uralic in the aDNA context, because it will clarify the situation by telling us the few realistic options that are left (and it’s not surprising that CWC-related cultures are becoming increasingly popular as the preferred option for pre-Uralic, since for now it’s the only clear candidate. But I don’t want to take part in this debate for now. Let’s wait to see if there are other realistic candidates and if they’re better ones than CWC-related people).

And I can’t finish this post without referring to Basque. I will write a separate post about it in the coming weeks, but I’d like to outline here a few basic things. Basque has been a very mysterious language for linguists and historians alike. There have been so many theories about it (colourful and exotic ones included) that it would be tiresome to even name them. Fortunately, aDNA has come to the rescue leaving us with an amazingly simple scenario: There are only two realistic options for the origin of Basque:

It was brought by Early European Farmers (EEF) during the Neolithic, ultimately from the Near East (Anatolia or beyond)
It was brought by the Bell Beaker folk during the early Bronze Age, ultimately from the steppe/North Caucasus (more details about this here).

Technically there is a 3rd option: that it was the language of WHG, but that’s such a low probability one that does not deserve further investigation. Everything else (a late arrival from North Africa or the Eastern Mediterranean, a relict from the Cro-Magnon people, etc… is just fantasy).

(By the way, I’ll expand on the Ibero-Vasconic subject on the mentioned future post, but let’s just say here that in the recent years, thanks to some advances especially related to the numerals in Iberian, it’s become mostly accepted that we’re talking about closely related languages, in the numerals specifically more closely related than many pairs of IE languages, and likely not due to borrowing but to a deeper relationship between them – unsurprisingly).

I don’t want to debate here about the likelihood of the 2 options mentioned above, since that would bring undesired controversy to this post (I’ll do so in that next one), but let’s see a few consequences of this:

Were (Ibero-)Vasconic languages spoken over a larger area at some point? It’s been a debated issue that no longer is debatable. The answer is very clear now: obviously yes. Or does someone think that Aquitanian fell from the sky in South Western France? No, related languages were once spoken all over Europe. There’s no way around that. And depending on who of the 2 mentioned populations brought it to Western Europe, they would have also been spoken in the Near East or in Easternmost Europe (the steppe). A different thing is if it’s possible to recover this substrate. Honestly, substrates from long gone languages are a very complicated issue. I think that once we know that such substrate existed at some point, it’s worth a collective effort from every linguist to try to find it (knowing what and where to look for), because that will help clarifying the linguistic history of Western Europe. If that will give any results is something I cannot answer. Maybe it won’t. Maybe it will. But just dismissing Vennemann’s attempt as an extravagant one is what is clearly not going to be productive or take us anywhere.
And where are the related languages? This again depends on who brought the language to Western Europe. If it was the Bell Beakers, one should look at languages like North Caucasian ones, especially NE Caucasian (and Hurrian-Urartian), since Lezgins or Tabassarans are probably the more direct descendants of the pre-Yamnaya population from the North Caucasus steppe (we’re still waiting for Hurrian aDNA, Hurrian being a possible ancestor of NE Caucasian. Given that we know that someone brought R1b-Z2123 to the Near East starting ca. 2400-2300 BCE, and that it was not Kura Araxes, given the time and place of appearance of Hurrians they might be the best candidates for it. Still just my speculation, but if someone sees a better candidate please comment about it). Also Uralic. And more distantly other North Eurasian languages possibly influenced by the steppe (now surviving in Siberia basically). If instead it was the EEF who brought the languages, then clearly Afro-Asiatic (and specially Semitic) would be the best candidates (with Kartvelian or maybe IE -if proved not to be from the steppe-). Again, a different thing is that this will give any tangible results. Languages separated thousands of years ago in extrerme ends of the continent without post split contact between them, but very strong contacts with other languages are very difficult to relate beyond coincidental features (like, apparently, Hurrian, Uralic and Basque not allowing words starting with “r-“). Still, knowing what and where to look for, and a collective effort at it, is the only possibility to find some results. Worth a try for historical linguists.

Closing remarks

This post is not intended in any intentional way to be controversial, nor to be exhaustive or to be specially accurate. It’s main purpose is to stress the importance of studying languages within it’s real historical and geographical context, taking into account all the different factors that influence every specific language (which are different in each case) and to move away from more idealistic and abstract analysis that have proven to bring poor and flawed results. And very specially, to take advantage of the wealth of new data coming from aDNA that puts some very tight constraints on what is possible and/or likely and what isn’t. aDNA is still a work in progress, but moving at an incredible fast pace. It has already provided amazing discoveries that change not only history, but also have important consequences for linguistics. I encourage linguists to keep a close eye on these developments, even if it forces them to abandon a line of research to which they have dedicated a long time and effort. This are times of fast change, and it’s fundamental to adapt to them and not get stuck in old theories.

48 thoughts on “Ancient DNA and Linguistics: an introduction”

Rob says:

2nd October 2018 at 05:46

I think we can learn from the case of English, although what I’m about to say is off-the-cuff recollection rather than a recent detailed look.
First off, the ongoing connection with the homeland, as well as the arrival of ”Vikings” must have faciliated intelligibility between lowland England and north German communities. And I think the role of the Vikings has perhaps been underestimated. The conteporary sources (clergy and opposing local rulers) must have painted a negative picture of their presence (these ”heathens”), but to rank-n-file Saxons, how different could they have been ?

Then, the case of the Normans tells us how elite conquest might have impacted on language, and it’s a beautiful case study. The Norman conquest is one of the best examples of an elite conquest (EC).
EC often outlined in historical- or socio-linguistic books, along with folk-migrations/ demic-diffusion (DD) and langauge shift (LS), so i think everybody has a general idea what they entail. Suffice to say, a major difference between EC and DD is not only the scale/ number of people involved, but also the social dynamics.

On the one hand, aDNA has obviated the need for ”EC” to explain the linguistic changes which must have accompanied the CWC and BB expansions, as we now know them to have been demic migrations, even if the migrant groups manouvered themselves as the elites in their new landscapes. On the other hand, EC is still invoked in discussing cases when the data did not match (or at least not yet) what might be expected under the received hypothesis, eg the IE migration into Anatolia. I have seen other mechanisms mentioned, such as language -shift which would require state-level organization. Needless to say, evidence for the former is not easy to demonstrate, and the latter does not seem possible due to the absence of state -apparatus until 2000 BC (Europe) and 3000 BC (southeast Anatolia), but this does not mean it did not happen. But to tease out these aspects would require more aDNA analysed in a multidisciplinary manner.

Back to the Normans: what did this elite conquest look like ?
As I said, im mostly recalling from a YouTube documentary I saw on Willian T.C. but the basics include
– few thousand Norman knights invade a foreign land & its army
– disposses (most of) the local nobility
– erect a system of fortified castles to rule from, but also defend themselves from the hostile countryside, along side an array other developments, incl legal.
– established a diglossia: French spoken in the court, and old English by the peasantry
So we have an archaeoligcally demonstrable trail & a specific linguistic impact.

As Alberto mentioned, hundreds of years later, when this system ended, the adstratum effects of French (structural and loan-words) catalysed the evolution of Old English to Middle English, i.e. it differentiated from the North Sea continuum and set the path to a what would be a recognizable ancestor of modern English.

Another example of E.C> often cited is the Magyar (proto-Hungarian) invasion. I think this entailed a movement of thousands of warriors, in a truly confederate nature (as the aDNA glimpses have suggested); and given the unclear settlement dynamics of the Pannonian basin c. 900 CE, it is not at present clear where it would lie on the EC — DD continuum.
Alberto says:

2nd October 2018 at 11:40

@Rob

Yes, English is a good example of how social processes drive the language evolution. I’d add to what you said the later development known as the Great Vowel Shift, which might have been an anti-French reaction, intentionally moving away from a French-like pronunciation.

We could mention also the difference in evolution of English in America, or even better the Spanish in America, which did transfer to native populations who spoke completely different languages and after 500 years it’s still the very same Spanish language spoken in Spain.

This is why, while for example computer analysis is a nice things due to it being unbiased and able to show statistics that would be difficult to obtain manually, it cannot be any kind of definitive tool (and this somehow extends to human analysis based strictly on language analysis without social/historical context).

I was looking at this graph of an IE tree generated by some program (unfortunately I forgot who exactly were responsible for it), and if I’m reading that correctly, it shows a split between Portuguese and Spanish about 500 years ago. The magnitude of this error is at least 100%, and this is with an incredibly high amount of data at its disposal for the analysis.

So even if the post didn’t come out as nicely put as I would have liked, I hope the basics are easy to understand, and that the importance of aDNA in the analysis of languages whose context was previously pretty much unknown can now be analysed in a realistic way thanks to the new data (for Basqueologists, for example, this should have already been a fundamental turning point, but I’m not sure it has permeated still. I hope to be able to write more about it in a separate post soon).
Kristiina says:

2nd October 2018 at 18:48

During the Mesolithic, the language map was probably mosaic-like as can still be seen when reconstructing for example the map of Australian aboriginal languages or indigenous North American language families:
https://aiatsis.gov.au/explore/articles/aiatsis-map-indigenous-australia
https://en.wikipedia.org/wiki/Classification_schemes_for_indigenous_languages_of_the_Americas#/media/File:Langs_N.Amer.png

Population explosion with farming probably caused a unification of languages in many areas. The next unification process started during the Bronze Age, and the modern expansive language families are probably the result of this process of increased mobility.

If two different languages are spoken in the same area side by side, and in particular if there is bilinguism, the grammar and the sound system start converging. Words are easily adopted at any point of time between languages, and the same root can be adopted several times from one language to the other and back to the original language in a new form. However, similarity of roots can even be a result of a Palaeolithic time depth, and the similarity of a root in two modern languages can be traced to languages that belonged to different language families than the current ones.

I think that the Nostratic similarity among Eurasian languages have at least two explanations: Paleolithic/Mesolithic dispersals of humans in Northern Eurasia and the Bronze Age convergence process that created the expansive language families that share many grammatical traits (accusative-nominative pattern, verbal conjugation with the marking of the subject only, singular-plural distinction of nouns, six-person system of pronouns and verbs).
Alberto says:

3rd October 2018 at 11:29

@Kristiina

Yes, in the case of the “Eurasiatic” subfamily of Nostratic the amount of language contact throughout the steppe, from Mongolia to Ukraine, has been so intensive (with nomadic tribes forming multiethnic confederations for millennia) that the strong similarities between them are easy to explain regardless of their respective origin.

It’s more difficult to explain the IE and Afro-Asiatic similarities in a model where the PIE homeland was in the steppe. An UP substrate is rather complicated too. Convergence after the IE dispersal does not seem to work either. So it would be interesting to know if those similarities are beyond coincidental or not.
Rob says:

3rd October 2018 at 22:38

Alberto, that image is from Garrett & Chang
http://linguistics.berkeley.edu/~garrett/Garrett-APS-2018-proof.pdf
Now that you point out, the Portugues split does seem too young
As for the rest , I’ve just taken for granted that it seems correct in terms of what we are seeing at a population level, that there were significant population movement & social change after the middle Neolithic

Btw, which are the Semitic – IE / AA similarities ?
Al Bundy says:

4th October 2018 at 00:06

Thanks Alberto, regarding IE and Semitic if you look up Anatolia and Caucasus cradle of IndoEuropeans you’ll find a very interesting post about all these issues.The poster seems to be well-versed in what he’s talking about and with your knowledge you can judge better if some things he says make sense.Basically, a PIE homeland on the steppe doesn’t allow for the similarities between some Caucasian languages, Semitic, and IE.It was written 2 years ago so only the steppe and Anatolian theories are talked about, and as some of us suspect both those homeland theories are ultimately wrong, but Anatolian does allow for those contacts whereas the steppe does not.
Al Bundy says:

4th October 2018 at 00:38

Correction a steppe homeland would allow for contacts between some Caucasian languages but not others, and not Semitic according to the writer.You add that to the archaeological evidence of Anatolian farmers moving into Europe and it made sense for Renfrew and others to think they were the indoeuropeans.But they were operating without ancient DNA and it seems now that metalworking was a major factor in spreading IE.
Alberto says:

4th October 2018 at 10:11

@Robert

Thanks, that’s the paper. It’s interesting that they refer to the problem with Romance languages I outlined above and that kind of problem brought them to introduce some reality checks in the algorithm by hardcoding certain points in the tree. That clearly improved the general chronology to a much more accurate ~4000 BCE for PIE, but it’s somehow a pity that they didn’t go a step (or a few steps) further and acknowledge all the other problems involved in realistic evolution of languages. I guess that would have introduced a complexity in the analysis that wouldn’t have allowed anymore for a tree to be constructed cleanly with the computer generated statistics.

Still funny how after admitting the impossibility of a putative proto-Romance being the true genetic ancestor of Romance languages and not Latin:

“In other words, the difference posited between Latin and the putative Romance ancestor is like that between French and Italian, two mutually incomprehensible languages. Based on what we know about Latin, this is simply false.”

And probably knowing (though not explicitly mentioning) that neither substrate or adstrate effects can explain this phenomenon, they just write that (emphasis mine):

“We argued that the effect in Figure 3 arises throughout the inferred tree (in the New Zealand team’s work and our replication) because of innovations that occur independently in related languages but were not present in their common ancestor. This happens often when languages share the same ingredients (related words with similar uses).”

Something that needs no detailed explanation here thanks to this being a blog where genetics are known by the readers, and therefor well known that the same innovations don’t happen independently (and they didn’t happen in the geographically separated Romanian either).

As for the similarities between IE and AA I’m not qualified to answer that question, but Al Bundy’s referenced article:

https://cabalinkabul.wordpress.com/2016/01/24/anatolia-and-the-caucasus-the-cradle-of-the-indo-europeans/

Has some comments about it. For more details, probably Allan Bomhard’s freely available book is a good source of information.

@Al Bundy

Thanks for that reference. Interesting read. The fact that linguists (and now geneticists) had to mostly operate in a dual Steppe vs. Anatolian hypotheses has been quite detrimental for making other advances. I’ll address this in an upcoming post about Western Europe too.
Rob says:

4th October 2018 at 11:40

@ Alberto – are you , then, suggesting that their PIE is dated too young ?
Rob. says:

4th October 2018 at 11:55

On another point , from your conclusion about linguists getting involved and making use of aDNA. It has already begun. For example the Danish teams working between Geneticsts, Linguists and archaeologists. They emplore others to do the same, or risk being left behind. (eg. https://www.academia.edu/32293183/Re-theorising_mobility_and_the_formation_of_culture_and_language_among_the_Corded_Ware_Culture_in_Europe)

The problem is, in their enthusaism, they’re not aware that their models are ….potential erroneous, and several people have remarked & blogged on this already (eg BBB and Carlos). This is not to take away from their fresh approach, but it does show that multidisciplinarity and fresh evidence does not always lead to robust conclusions.

The central tenet of their model is that proto-Germanic evolved as Corded Ware culture merged with TRB in South Scandinavia. So they are going with the Battle Axe model which has been around for a while, and is entirely reasonable as a starting point. But they do not acknowledge that other theories exist, notably that proto-Germanic might have formed considerably later, and further south (eg Udolph, Schmidt, Schriver, Dahl).

They also chose to go for a linear model which might be simplistic. Eg – after CWC / BAx, nothing of cultural significance occurred which might have catalysed language change. It’s suprising Kristiansen, of all people, would make this error.

Thirdly, Kroonens substratum theory is rather tenuous. He claims that he has uncovered a common Neolithic substrate stretching from the Aegean to Scandinavia. Not only is it unlikely that Neolithic Europe spoke a singular language, but the moreso communities in Scandinavia and Greece, which despite being “‘neolithic farmers” were worlds apart culturally and also genetically.

But the ultimate litmus test for their model is : if proto-Germanic is as per their model,then Germanic speakers should be dominated by I2a1 and R1a, which is not the case, as any Jo-citizen genealogist can tell them. Instead, Germanic speakers are characterised by R1b-U106, I1, I2a2. We know this is no accident of modern history, as we have several aDNA studies from Iron Age and medieval Germanic groups, whic mirror the above. We might say, oh but Y-DNA is not everything. And that’s true, but when their basis of their model revolves around CWC male identity-making, then somethign is clearly wrong.

It would therefor be adviseable they address these more localised perspectives before creating overarching models (& maps) for the I.E. Urheimat as a whole, because potentially false assumptions about the latter has clead to obvious errors in the former.
Alberto says:

4th October 2018 at 11:59

@Robert

No, not at all. The first attempts by using statistical computer analysis were more consistent with the Anatolian Hypothesis, with PIE being much older than 4000 BCE. In that paper, by introducing some constraints to account for some problem with how languages evolve in reality plus harcoding some known dates, they got to a much more accurate estimate at around 4000 BCE.

Though I still think that the method is not reliable enough. They probably got the PIE dating more or less right, but it’s quite random how the algorithm works depending on what you feed it. It’s like trying to build a genetic tree of Chalcolithic to modern populations based on mutation rates. It doesn’t work. That only works for UP/Mesolithic populations. After the Neolithic the interactions between populations are much more complex, both genetically and linguistically. And that’s what needs to be analysed in detail and in an individual way, taking into account all the available data concerning the specific case.

As I said in the post, you can’t just assume that the similarities between Balto-Slavic and Indo-Iranian are due to a closer genetic relationship ignoring the close and longstanding interaction between speakers of an early form on Indo-Iranian in the North Pontic area and the populations that would become Balto-Slavic.
Marko says:

4th October 2018 at 15:47

@Rob

Curiously the very same group of researchers published several papers describing the south-eastern connections of the Nordic Bronze Age groups. Vankilde calls the sudden appearance of weapons & warrior cults that first developed in the Carpathian basin an “exogenous flow of ideas”. I suspect she is very wrong about the nature of these cultural influences.

It seems that patriarchalism and elite dominance are often invoked selectively as explanations even by serious researchers. I guess the idea of (relative) autochthony is just inherently more appearling than conquests and the like.
Kristiina says:

4th October 2018 at 21:19

@ Rob
Semitic languages in particular contain several nostratic features: accusative-nominative pattern, verbal conjugation with the marking of the subject only and singular-plural distinction of nouns. While eight cases are reconstructed to PIE (nominative, vocative, accusative, genitive, ablative, dative, locative, instrumental), Proto-Semitic has only three (nominative, accusative, genitive).

In any case, several features are shared between Indo-European and Afro-Asiatic languages. However, the similarities between Indo-European and Caucasian languages may be more salient. From the point of view of phonetics, all these languages are rich in consonants. Ablaut/ apophony is frequent in IE languages on the European side of the family and ablaut is a fundamental principle in Afro-Asiatic languages and much more wide-spread than in IE languages where it is usually found in a few irregular plurals (mouse-mice) or in irregular verbs (sing-sang-sung).

Gender system is again a fundamental feature of the Afro-Asiatic language family and in some Caucasian languages. However, gender is often not reconstructed to PIE because Hittite lacks it. Wals database shows that apart from IE languages, sex-based gender systems are typical of Indian, Afro-Asiatic and many Caucasian languages http://wals.info/feature/31A#2/36.3/70.3. PIE also shows ergative/active traits and these systems are frequent in Caucasus and India. Afro-Asiatic languages, such as Proto-Semitic and classical Arabic are accusative-nominative languages, but almost all modern languages lack this pattern and it has been proposed that Proto-Afro-Asiatic was an ergative type of language.

Articles, which are an innovation in IE languages, are shared with the Afro-Asiatic languages, and one could easily think that the European Neolithic languages had articles. One interesting point is grammatical reduplication which existed in Greek and Sanskrit, because reduplicative forms are frequent in Dravidian and Afro-Asiatic languages.
Kristiina says:

5th October 2018 at 07:01

In particular Romanic IE languages have converged with Afro-Asiatic languages: loss of case system, gender system with feminine-masculine distinction, use of articles, order of adjective and noun is noun + adjective(https://wals.info/feature/87A#2/18.0/152.9). The order of the genitive and noun has also changed from genitive + noun to noun + genitive (https://wals.info/feature/86A#2/19.6/152.9). Moreover, almost all IE languages in the west have turned from SOV languages to SVO languages, as many spoken varieties of Arabic (https://wals.info/feature/81A#2/18.0/152.9).

My presumption is that in the Neolithic farmer languages that were spoken in Europe, ablaut/apophony was widely used, there were articles and the order of words was like in modern European IE languages and maybe there was a sex based gender system.
Al Bundy says:

5th October 2018 at 09:54

@Alberto Rob Wouldn’t 5000 BCE seem to be a more accurate dating for PIE?
Al Bundy says:

5th October 2018 at 10:19

Garrett’s paper only talks about the steppe and Anatolian models and seems to say well, this model is wrong so that one must be right. It was also written before the Mathieson paper, which talks about a possible PIE homeland in Iran and the latest papers which seem to point to the same general area.Finally there’s no consideration of not all IE languages expanding from the PIE homeland but some from a Late PIE homeland. Thank you Kristiina for all the info.
Al Bundy says:

5th October 2018 at 10:28

He does cite Reich’s book but doesn’t mention Reich’s opinion on where PIE is from.Johanna Nichols is also at Berkeley like Garrett and her work is looking better than ever.
Rob says:

5th October 2018 at 11:35

@ Marko

”Curiously the very same group of researchers published several papers describing the south-eastern connections of the Nordic Bronze Age groups. Vankilde calls the sudden appearance of weapons & warrior cults that first developed in the Carpathian basin an “exogenous flow of ideas”. I suspect she is very wrong about the nature of these cultural influences.
It seems that patriarchalism and elite dominance are often invoked selectively as explanations even by serious researchers. I guess the idea of (relative) autochthony is just inherently more appearling than conquests and the like.”

You’re correct – something happens in north-central Europe c. 2000 BC, after the large scale migrations from the steppe, which did not alter the autosomal structure, but there was certainly movement within Europe, and it profoundly altered social structure. It’s called the Bronze Age:)

The impression one gets reading some of the prevailing archaegenetic discourse is it’s rather blinkered. Archaeological context is often shelved back at the Supplementary sections. In turn, archaeologists and linguists who might want to take advantage of this data, but might not be very familar with the complexities of aDNA (and it is a lot to take in, constantly growing), can read only a broad-brushed picture mostly revolving around genome-wide patterns analysed with certain a priori assumptions. Either that, or their a select few who are merely propounding their own life-long held views.
This is the situation at present, it leaves a room for improving. But that’s what the future is for.
Rob. says:

5th October 2018 at 11:45

@ Al

” Wouldn’t 5000 BCE seem to be a more accurate dating for PIE?
Garrett’s paper only talks about the steppe and Anatolian models and seems to say well,”

It’s hard to say, 5000 , 4500 4000 BC ? How can we tell exactly ?

Your second point is correct, and Alberto mentioned it too. It important not to feel constrained to any singular or two models, but to use the data to it’s full potential.
Whatever the answer is, more data is required from important regions- Greece, Anatolia, South Asia even if most people think the question is already solved
Alberto says:

5th October 2018 at 13:38

Yes, I agree, the attempts so far to catch up with aDNA have been a bit lacking. I guess that in academic terms this is still too new. There really is room for improvement, and that’s what I was trying to point out in the post. I’ll elaborate more about some of the problems pointed out in the above comments in the next post.

I agree also about the dating of PIE. Broadly it’s Chalcolithic (and not Early Neolithic). But it’s not possible for now to give a specific date.
Alberto says:

5th October 2018 at 13:47

@Kristiina

Thanks for chiming in to the rescue about the question regarding IE-AA similarities 🙂

As for the changes in Romance languages, I’m not sure why they happened. But we do know the languages spoken by the time of the Latin expansion. Basically Celtic and Ibero-Vasconic (Tartessian is difficult to classify so far). And none of this seem to resemble Semitic languages. Basque seems more similar to North Eurasian languages in general terms. Don’t know much about Etruscan.
Marko says:

5th October 2018 at 15:31

One rarely discussed implication of Garrett’s & Chang’s model is that with the imposed time constraints non-Balkanic IE is inferred to have separated around 2200 BCE, which means that most extant European IE languages might derive from one cultural complex that existed well into the Bronze Age (again the Carpathian cultures come to mind). One might speculate that this closeness is a result of areal influences, but I don’t know if that suffices as a an explanation. I suspect a later split of European IE would push PIE as a whole back in time.

If the model turned out to be at least somewhat accurate the spread of IE with both CW & BB would probably be completely untenable.
FrankN says:

5th October 2018 at 20:10

There is lot of things to comment and/or expand upon, and some issues (e.g. Proto-Germanic, as addressed by Rob, or the affiliation of EEF language(s)) probably deserve a posting of its own. So, let me start with a few minor things:

1. Icelandic:
I myself, in a comment on another blog, have brought up Icelandic as an example of an “isolated language”. Unfortunately, I seem to have been wrong. As P. Schryjwer, in “Language Contact and the Origins of the Germanic Languages”, points out (p. 160):
“We know that Iceland was uninhabited, apart from the odd Irish monk, when it was discovered in the ninth century; it was subsequently quickly settled by the Norse from Norway and the British Isles, so language contact may have played an insignificant role in shaping the ways in which Old Norse changed into Icelandic, although we do know that a significant proportion of the earliest settlers of Iceland came from Ireland and bore Irish names, suggesting they may have brought their language with them to Iceland.”
Now, IIRC, according to one of the abstracts presented on last month’s conference in Jena, significant Irish genetic impact on Icelanders can be demonstrated. And, in line with this (and the overall theme of this post), Icelandic contains a number of Gaelic borrowings, e.g. tarfur “bull” (Ir. tarbh).

2. Great Vowel Shift
The English Great Vowel Shift is a fascinating, so far poorly understood phenomenon. Socially, it emerges from the Black Death that eliminated much of the old Norman landholding elite, and lead to the emancipation of non-Norman urban as well as rural populations (as labour was scarce and consequently well-paid, the cities, especially London, provided ample opportunities for migration and employment). However, that process wasn’t an “anti-French reaction”, but rather the convergence of different local dialectal traditions (some Anglo-Saxon, other post Celto-Romance) in SW England.
The initial phase of the Great Vowel Shift (ca. 1400 -1550 CE) comprised as major shifts:
1. “bite”: /i:/ -> /ɛi/
2. „meet“:/e:/ -> /i:/
3. „out“: /u:/ -> /au/
4. „bloom“: /o:/ -> /u:/
5. “late”: /a:/ ->/ɛ:/ (->/eɪ/ after 1600)

Intriguingly, more or less the same shifts mark the transition from Middle High German (MHG) to early modern High German (HG) around 1350, i.e. roughly a century earlier than the English shift, setting it further apart from Low German (LG). Some of the shifts are also present in Modern Dutch (D). Compare:

1. “bite”: MHG bīzen -> HG beissen; vs. LG bieten, D bijten [Ital. pizza borrowed from Langobardian]
MHG, LG min vs. HG mein (”mine”)

2. ”meet”: HG Ried, D riet vs. LG Reet (“reed”)
HG lieb, D lief vs. LG leev, ME leef (“dear, beloved, lief [arch.]”)
[Less well documented, shift already commencing in MHG/MD, but HG streamlined the MHG “i-e” diphtong into a long “i”]

3. ”out”: MHG uz, HG aus vs. LG ut, D uit;
MHG, LG hus vs. HG Haus (“house”)

4. ”bloom”: HG Blume, D bloem [D “oe” ~ /u:/] vs. LG Bloom [/o/-sound maintained]
[The HG/D /u:/ sound emerged from an earlier (MHG/ MD) “u-o” diphthong]

5. “late”: HG gehen vs. OE, MHG gân, LG gahn, D gaan “to go”
[Somewhat irregular, sound shift partly already occuring in MHG]

I deem it unlikely that Early Modern English adapted to Early Modern High German pronunciation for political reasons, as a kind of “anti-French reaction”. In that case, it should have adopted HG orthography alongside the new pronunciation instead of maintaining the OE spelling.
Moreover, the Great Vowel Shift occurred at a time when the Hanseatic League was at the height of its power and introduced Middle Low German as lingua franca across North Central Europe from Bergen in the NW to Dortmund in the SW and Talinn (Turku?) in the East. Equally rich and powerful during that period were the Netherlands including Flanders. Given the intensive English commercial, social and political connection to Dutch and LG speaking areas, especially during the early 16th cBC (reformation!), if there had been a political statement to be made, or the Great Vowel Shift reflected external language contact, the shift should have gone towards Dutch and Low German, respectively, not towards High German as the language of Catholics and the Habsburgians.

This leaves the apparent linguistic parallel of the Great Vowel Shift to the MHG – HG transition as a yet unexplained enigma. The most parsimonious explanation, IMO, is a resurgence of Celto-Romance dialectal traditions in both of England and SE Germany/ Austria, where the MHG – HG vowel shift originated, facilitated by the partial collapse of the feudal landholding system after the Black Death that a/o also allowed for the resurgence of Czech.

One indication in this respect is the (secondary) /a/ – > /ɔ/ shift as in „walk“ that is equally present in modern Bavarian dialect (c.f., as a more pronounced shift towards open “o”, MHG wage -> HG Woge “wave”).
Moreover, two of the a/m sound shifts were already present in Old French/ Old Picardien, from where they may have influenced both MHG/ MD and ME, but may equally also point at a dialectal substrate with a more widespread distribution along the Roman Empire’s NE frontier. These sound shifts were:

(4) /o:/ -> /u:/, e.g.. OF flours vs. Lat. flores “flowers”,
OF poule vs. Ital./Span. pollo, Provencal pola “hen”;

(5) /a:/ ->/ɛ:/; e.g. OF feve, French fève vs. Lat. faba “bean”.

In addition, the /e:/ -> /i:/ sound shift (2) occurred at least partial across a broad range of Romance languages, Compare, e.g. “foot”, from Lat. pēs:
– Walloon, Dalmatian pi, Friulian pit, French pied, Span. pie, Rom. piez; but
– Port. pé, Old Occ. / Romansch pe, Sard. pee, Neap. pere, Catalan peu.
[Quite an interesting Romance isogloss, isn’t it ? Note, btw. that Allemannic (Schwyzerdütsch] also didn’t partake in that vowel shift, which is having me thinking rather of Venetic than Gaulish dialectal substrate. The earliest High German attestations stem from 12th century Tyrol and Carinthia, i.e. regions that delivered late Hallstatt/ early La Tene Venetic inscriptions]

Of course, further linguistic research is required to confirm or reject this explanation, and especially to shed more light on the origin of the /i:/ -> /ɛi/ and /u:/ -> /au/ shifts. Such research is so far lacking with respect to the MHG – HG sound shift – even recent German linguistic text books just state laconically “causes yet unexplained”. From what I have seen, British research on the Great Vowel Shift is hardly better.
Atri∂r says:

6th October 2018 at 07:03

@Alberto
You say in your OP: “we know with a high degree of certainty that at that time and place their language had to be a very early form of Indo-Iranian, that even if in the Iranian branch already (not very clear), it had to be closer to Sanskrit that to any other known language (including Avestan).”

If the first part of this statement is true, I agree with you that the second part is most definitely true. However, we never had any proof of the first part and we still don’t. There is circumstantial evidence. And lately, that evidence is facing serious challengers.

In reference to your headline and larger gist of your OP, I think that though some linguists have been slow due to unfamiliarity of the new genetic science, there is a deeper instinctual fear that many of the beloved models that justify the field will crumble. Starting anew is intimidating, especially when one relies on the soundness of one’s field for living.

Lastly, the Nostratic model in my opinion is almost equivalent to numerology; squint and see connections in everything. To a much, much smaller degree, even the Indo-European models suffer from this at times. Except for Nichols’ model.

As Rob says, there needs more sampling. And specifically from the areas he mentions, Greece, Anatolia, and South Asia. These are the only areas that matter at this point (for PIE solution).

There is one joker in the cards though that might skew even these potential future samples…
Alberto says:

6th October 2018 at 11:03

@Atri∂r

Well, the evidence about the geographical and temporal origin of the western Scythians is not so circumstantial now. They had admixture from Siberia and BMAC and we have samples from Kazakhstan ca. 1500 BCE that look like clear ancestors of those western Scythians. I wrote in more detail about it here, and you can also check Chad Rohlfsen’s latest post here using formal methods.

I agree about the problems that linguists face when dealing with this new data. I know it’s not easy to transition to the new era, but it’s something that must happen. The sooner the better.

And sure, we need still quite more samples to get definitive answers (if we do get them) about linguistic topics that go into prehistorical times. But things are getting really interesting lately. Stay tuned for further posts about these topics 🙂
Alberto says:

6th October 2018 at 11:06

@FrankN

Thanks for all the interesting information. No time now to reply to all of that, but many of those things do deserve some post of its own, so maybe that’s the way to go if we want to go deeper into it.
Atri∂r says:

6th October 2018 at 15:36

@Alberto
That the genetic evidence that they were Scythian is solid, yes; linguistic evidence that they were early Indo-Iranians in the Copper or Bronze Age, no. That is where it is circumstantial.

Yes, I agree, something that must happen.

Looking forward to your coming posts.
Alberto says:

7th October 2018 at 00:56

@Atri∂r

I’m not sure I’m following you. The linguistic evidence from the Copper Age or Early Bronze Age is obviously missing. But what does it have to do with the Western Scythians from the Iron Age in the North Pontic region? From those ones, we do know with a decent degree of certainty (even if the evidence is scarce) they they spoke an Indo-Iranian language. It usually has been classified as East Iranian. Knowing where those people formed, and knowing the languages spoken in SC Asia from around 1500 BCE or a bit earlier (early Indo-Iranian), I’m just arguing that the language of the Scythians would had to be early Indo-Iranian (maybe already in the Iranian branch, but close enough to the split to be as close or closer to Sanskrit -which in it’s Rigvedic form is quite close to the split- than to any other known language, including Avestan which is further away from the split and more clearly in the Iranian branch.

So to which part are you referring exactly as being circumstantial and lately facing serious challenges? (Note that I’m not making any statement as to the place from where I-I arrived to SC Asia, since I have no idea at this point. We just know it was there at that time).
Rob says:

7th October 2018 at 01:35

@ Alberto

About Slavic & Iranic contacts, you can read Trubachev’s & Golab’s work, although from decades ago. They attempt to uncover the layers of linguistic strata in Slavic.
Trubacev opts for a more western homeland, yet both point out the obvious contacts between Slavic and Iranic from the Iron Age.

Golab’s book available here https://slavica.indiana.edu/system/tdf/bookContent_pdf/04_Golab%20-%20The%20Origins%20of%20the%20Slavs.pdf?file=1&type=node&id=622&force=
Atri∂r says:

7th October 2018 at 20:28

@Alberto
My position for several years has been exactly what you just mention. It still is (my position) to an extent, with one loophole (arising from recent dna samples). Because of this loophole, I do not think we are in a position of certainty. We have a few names of kings and gods from Herodotus and that is some data for sure, but not a position of proof. Unfortunately, we just don’t have any records of the Scythian languages (till the Khotanese texts in the East).

To answer your question, I guess I’d have to ask you where you thought the PIE homeland was as this is precisely tied to whether the Western Scythians were I-Ir in the Iron Age. Also, it also depends on who you think the Western Scythians were in the Iron Age, as opposed to the early Middle Ages or Classical Antiquity.
Vara says:

7th October 2018 at 22:17

There was some sort of Indo-Aryan or Indo-Iranian language north of the Caucasus. Some Cimmerian names have been argued to be a mix of Iranian and Indo-Aryan and we do have a famous Indo-Aryan hydronym (Kuban river) in Circassia.
FrankN says:

7th October 2018 at 22:47

Alberto – re Scythians:

this is a bit out of my head, as I lack the time now to repeat the in-depth research I did about a year ago.

First of all, note that Greek historians used the designation “Scythian” in a comparatively vague manner as they used “Celts”. The latter a/o also comprised Rhaetics and W/N Germanics, while they included Goths and Huns among the former. As such, the fact that a certain group is adressed as “Scythian” by antique writers doesn’t allow for a precise linguistic assignment.

More specifically, the various “Scythian” groups described by Herodotus include:

– the Geloni, according to H. ancient Greeks being driven away from the Black Sea coast, and by his time still bilinguar. Their capital Gelonus might have been the large IA fortified city under excavation at Bilske Horodyshche near Poltawa, Ukraine.

– the Budini, settling around the Geloni but speaking a different language, had acc. to H. all red hair and deep blue eyes, and were by other classical authors described as reindeer herders. Their ethno-linguistic association is debated. Some authors regard them as proto-Slavs, other as Uralics (Votians, Permians) – an issue where aDNA clearly may be of help. In any case, there appears to be consensus (however well founded) that the Budini didn’t speak Indo-Iranian.

– the Androphagi (lit. “man-eaters”) between upper Dnjepr and upper Don. Gimbutas hypothised that OGr. Androphagi may have been a translation of Iranian (Scythian) *mard-xwaar “man-eater”, a folk-etymological interpretation of the ethnonym “Mordva” (Uralic Mordwins).

– the Neuri, by H. placed into the Bug area, commonly linked to the Narew river (W. Belarus/ NE Poland), i.e. an area assumed to have spoken W. Baltic during the IA.

Last but not least, some of the tribes listed by H. as dwelling east of the Urals are for ethnographic reasons (pastoralist nomads living in tents) believed to have possibly been speaking Turkic (Chuvash?).

In short, while ancient “Scythians” most likely incorporated an Iranian-speaking component (but note that the Sauromantes ~ Sarmatians, the most likely “tribe” to have spoken Iranian, were labelled by H. as “non Scythian”), we shouldn’t assume “Western Scythians” to represent a linguistically coherent grouping – quite the opposite seems to have been the case.

More specifically, there is lots of nitty-gritty work yet to be done in aligning – for each locality in question – the archeological record with aDNA results and reports by ancient historians. The aDNA papers available so far on “Scythians” have shied away from such efforts that, in all fairness, may also go beyond what should be expected from them. The task is now on archeologists and linguists to synchronise their findings with the aDNA data that has become available. When that task has been accomplished, we may have gained a reasonably firm ground to base conclusions on. Until then, I for myself have decided to remain rather tacit on “Scythian” aDNA, which isn’t exactly my home turf.

That, of course, doesn’t mean that a well-founded blog post, which addresses “Scythian” aDNA on a local scale within its specific archeological context in relation to antique written sources couldn’t be informative.
Al Bundy says:

8th October 2018 at 01:36

@Atrior Unless I misunderstood, the 2 locations for R1A are south of the Caucasus or the Siberian Corridor, but if R1A is from Siberia PIE comes from J2.I’m assuming your stance is basically the same as it’s been, obviously needing more samples from the early attested IE languages.
Atri∂r says:

8th October 2018 at 06:13

@Al
Yep 😉 With all the recent papers, I think it’s pretty safe to say that R1a is from up north. But we do need a lot more ancient samples from down south. That said, I feel that this might face serious challenges because of an impossible situation to rectify (concerning R1a in the south).
Alberto says:

8th October 2018 at 18:50

@Atri∂r

Yes, he evidence about the linguistic affiliation of Scythians is scarce, though I think it’s hardly controversial to say that Indo-Iranian was one -probably the main at the earlier period following the Srubnaya culture- of their languages. We have some scarce direct evidence and indirect one from Uralic or Balto-Slavic.

Sure, as FrankN has elaborated above, the term Scythian is very generic and refers to many different tribes in a large area during a long period. Uralic was probably the second most used language by tribes referred to as Scythians during the Iron Age, and later Turkic started to arrive to the west too (but that’s probably more in a post-Scythian period already). I don’t know which samples are you referring too when mentioning that loophole.

As to where I think the PIE homeland was, I don’t think I could point at any specific place. Probably in the vicinity of the Black or Caspian seas, but whether north, south, east or west I don’t know. And that’s a pretty broad area. My position is neither strongly for one area or strongly against another. I’ve been “forced” to argue a lot against the people trying to sell me for years that the Steppe Hypothesis was super solid and beyond proved, which made many of them think I’m against it by principle. But no, that’s not the case. I just think that the PIE question is a very complicated one and every hypothesis has many problems, including the Steppe one. So while in absolute terms it might seem I give very small chances to the Steppe hypothesis to be correct, when I put things in relative terms (compared to others) I don’t see another one that is clearly superior in terms of solving all the problems in a better way. There are just a number of possible scenarios, and I wait for more data to start revealing which of them becomes more likely than the others. I may lean towards North Iran at this point, but not very strongly so. Time will tell.
Rob. says:

9th October 2018 at 11:51

@ Frank

”More specifically, there is lots of nitty-gritty work yet to be done in aligning – for each locality in question – the archeological record with aDNA results and reports by ancient historians. The aDNA papers available so far on “Scythians” have shied away from such efforts that, in all fairness, may also go beyond what should be expected from them. The task is now on archeologists and linguists to synchronise their findings with the aDNA data that has become available. When that task has been accomplished, we may have gained a reasonably firm ground to base conclusions on. ”

I think necesarily so. For example the most recent paper from Black Sea Scythians and Sarmatians was cautious about linguistic ascriptions, from my brief initial read. How do we know exactly what these ‘Royal Scythians”, ”ploughmen Scythians”, ”Neuri” really were ?
They’re often just brief mentions, sometimes just a hapax legomenon.
But we know that by the Greek colonziation period, at least Iranic was spoken, and perhaps from considerably earlier.
Allan R. Bomhard says:

11th October 2018 at 21:58

Very interesting paper, and very interesting comments. Correlating the linguistic, genetic, and archeological evidence, as currently understood, is a daunting challenge, to put it mildly.
Alberto says:

12th October 2018 at 18:19

@Allan R. Bomhard

Thank you for reading and leaving your comment. Indeed it is a big challenge and it will take time and collective effort to make good sense out of it, but it’s also a very exciting one. I hope that as a professional you will be able to enjoy it as much as some of us are doing as amateurs.
Egg says:

12th October 2018 at 20:03

Alberto, from my recollection your overall preference was for an origin on the eastern (Don-Volga) PC steppe which I think has only been strengthened with every subsequent paper. What made you change your mind towards something from north Iran?
Alberto says:

13th October 2018 at 09:14

@Egg

I think I’ve discussed in the past that the steppe population that later expanded into Europe came from the east or south east of Ukraine, and that the neolithic population from Ukraine was replaced by them. I mentioned the North Caucasus, North Caspian and, more speculatively, Central Asia as possible origins of this population. However, I don’t think I ever implied that they were necessarily Indo-European speakers. I’ve been rather sceptic, or at least cautious, when linking this population to IE. There were always several things about that hypothesis that never worked for me.
Marko says:

13th October 2018 at 09:46

@Alberto

With the recent findings of signficant steppe admixture and Y-DNA replacement in Western Europe where non-IE languages were spoken well into the historic period, and the general decline of CWC across much of Europe, have you considered the possibility that Indo-European languages were introduced to Europe late – perhaps no earlier than Únětice related expansions? It’s not something I’d necessarily put much stock into before more DNA from Anatolia, Greece and India becomes available, but I think the samples we have from those places might point in the direction of something like that.

Anyway, I would love to read your thoughts on the whole PIE question in the future. This is definitely the most interesting blog of this kind out there. Great work.
Alberto says:

13th October 2018 at 10:26

@Marko

Thanks for your kind words.

Indeed I think that with the current data the link between CWC or BBC and IE languages is tenuous. A later expansion is a more attractive proposal, and Unetice is a key culture in Central Europe that is quite interesting from the genetic perspective, having strong steppe admixture but Neolithic Y DNA.

But as you say, we still need more samples from the key areas to be able to put forward something a bit more solid. Hopefully coming soon.
Rob says:

13th October 2018 at 15:34

@ Alberto
It’s not just Neolithic Y -DNA (e.g. I2) in Unetice, there’s also R1b-U106. I think it was found in one of the Hungarian BB/ EBA. U106 is also found in the Late ”CWC” individual from Sweden, and MBA Netherlands.
South of Unetice (Hungary) there’s J2.
So in that part of ECE is where steppe, Old Europe and Bronze Age influences meet, and it was evidently characterised by several lineages, confirming the idea that it marks the first Chiefdom society in temperate Europe, running from southeast to northwest.

By contrast, a more direct continuity from BB is seen from Bavaria and further west. Similarly, the epi-Corded horizon extends from southern Poland to the Russian forests in what would be an R1a contiguous zone, more or less still as segmentary lineage groups.
Egg says:

13th October 2018 at 20:19

Alberto, thanks for the clarification. I agree with you about the change in the Ukraine, extra ancestry from the western farmers and from eastern steppe populations where the ‘steppe’ signal seems to first appear, compared to the preceding population, but I didn’t realize you weren’t sure about connecting this eastern to western steppe movement to IE. For me on the other hand it was an interesting initial hint of a scenario I always found very plausible and I think has gained in strength since then with the South Asian and Caucasus papers.

Marko, have you picked up Drews’ latest book by any chance? Part of your argument about a relatively late Indo-Europeanization reminds me of it though I haven’t read it fully yet. I’d tend to agree that certain secondary importations from the Carpathian Basin (e.g. the chariot complex in northern Europe and perhaps Greece too if Greek arrival precedes the MHIII, a clear signal of later and secondary steppe influence in those areas but whether cultural transmission or genuine *important* movement not too clear. then again Aegean links too during the period and as Rob mentioned J2a all the way up to Hungary and what looks like extra CHG in post-Vucedol) appear rather late, after the already important previous movements that have been sampled so well compared to everything after but I’m not sure to what extent there was long-term linguistic influence involved.

As you and a few other people have mentioned the post-Beaker Central European horizon of cultures has always loomed rather large for a number of branches (most at various points, even, except the more geographically marginal/innovative ones like the ‘Greco-Aryan’ group) though so secondary expansions affecting language have a high prior probability to me. But I think IE at least must have initially arrived “relatively early” to most of Europe with the decently sampled Corded, Beaker and the Balkan Bronze Age. One interesting point that you bring up is on the other hand whether those early groups managed to affect local, long-term linguistic change in most places rather than just genes, a la Basques. It’s a bit hard to really know that considering we have Bronze Age attestation from only a particular place in Europe. Maybe we’ll never know past arguments about substrata, unless we really find no evidence of any sort of later important influence in some places. I admit, e.g. some varieties of “Celtic” in Iberia and Ireland I personally find harder to really connect to later movements personally considering the links are a bit weaker with later Central European phenomena like Urnfield and La Tene but it’s a bit hard to think that even the most plausibly archaic varieties could have been separated for so long from each other as well.

Rob, I think the U106 you have in mind might be the Beaker Hungary I4178. Close to Beaker and post-Beaker western populations and also Unetice. I think there was a Czech Unetice sample with U106 too but I don’t recall details.
Marko says:

13th October 2018 at 20:53

@Rob

Have you read Drews’ 2017 book ‘Militarism and the Indo-Europeanizing of Europe’ by chance? He makes the point that the developments that lead to the spread of at least Germanic/Celtic/Italic took place in the south-east of the Carpathian basin, where the sudden shift towards a militarized and stratified society parellel to very similar developments in the Mycenean world was effected by a foreign elite. The picture he paints is quite fascinating – Drews argues that Yamnaya-Vučedol descended cultures (Nagyrev, Hatvan etc. ) in the Carpathian basin seemed like the concept of war must have been quite foreign to the native population, and that they must have accepted the rule of the armed invaders that would settle there without much of a fight. After the arrival of the newcomers ca. 1600 BCE in contrast hundreds of bronze swords can be found in Romania, Hungary and Serbia.

Drews derives the Carpathian invaders from a steppe culture near the Kuban river I’d never heard of (Solomenka). The Mycenaean elites he argues based on a review of partially unpublished evidence most likely had their origin in the Trialeti culture of Georgia.

It appears that the new elites in these regions had no intention of replacing the native populations, preferring instead to exploit their labor. If this is true, it would be fascinating to know if & in what way the elites were genetically differentiated from the ordinary population, and what the many languages & cultures were that inevtiably must have been lost in the process.

@Egg

Oh lol, I hadn’t seen your comment.
Rob says:

15th October 2018 at 12:11

@ Marko & Egg.

I think there is a movement from the Anatolia to Europe from 2500 BC onward. In terms of colonization, it’s limited to Thrace and Greece. Further movement to the Adriatic and Carpathian basin, perhaps more individualized groups mixing with demographically larger central European populations (Mako, post-Vucedol, BB, etc). North of the Carpathians, the genetic impact is minimal at a population level, and the lineages are ‘European’ (I, R1) but with a more varied selection of sub-haplogroups. This suggests the emergence of new chiefs in regional groupings which would have produced a relatively variegated linguistic langscape.
I’m not sure about what happens in Italy, too little data at present, and would have to guestimate based on archaeology.
As for Drews, I like his analysis of military archaeology, but I don’t think there was a conquest of Europe in 2000 BC.
Marko says:

15th October 2018 at 18:00

@Rob

I agree, Drew’s compilation regarding the militarization taking place in South-Eastern Europe is impressive, and I think this phenomenon is undoubtedly real. However his explanation of an invasion seems a bit ad hoc. An infilitration of Anatolians in the Bronze Age seems like a more realistic proposal. I do think the development of sword production beginning in Arslantepe and later Alaca Höyük might have some significance though. I believe samples from the former site have already been tested.

When it comes to haplogroups I do wonder how lineages with Bronze Age TMRCAs like E-V13 and some clades underJ2a & J2b came to have seemingly pan-European distributions, attaining significant frequencies even in places where later metal age intrusions would be unexpected like for instance Scandinavia and the forest steppe.
Robert says:

16th October 2018 at 13:50

The Steppe, SEE, Caucasus and Anatolia had been interacting since the 5th millenium BC, and this continued to the middle Bronze Age at least (Catacomb, Trialeti, BA Anatolia, Mycenae, etc). A complete understanding of this will take more time. But the potential’s definitely there.