The ancient DNA (aDNA) era has been a revolution (still in progress) in our understanding of prehistory, and one of the implications of it has been the search for the origin of languages (mostly the origin of IE languages, which has been a central subject in many aDNA publications). However, I have the feeling that linguists themselves are not very actively taking part or advantage of the wealth of new data that has come to light so far and continues to come. Here I’ll outline some of the basics of how the available data can already help linguistic research and point to a few of the future developments to look forward too.
Language families, mutation rates, divergence, convergence… The lesson from genetics
Up until now, linguists have mostly relied the the analysis of languages themselves in their studies of language families that go back into prehistory. Not that they have ignored the archaeological data, but admittedly, archaeology could not provide the level of information about ancient populations that aDNA can and so their reliance of archaeological data was quite generic, without it imposing any serious constraints into their linguistic research. Therefor, language families and the family trees have been mostly based on linguistic data alone, including statistical analysis with computer programs relying on mutation rates (to a degree). So I’d like to debate a bit about this in the light of aDNA to see how these things can correlate.
The first lesson to learn comes from the way genetics have evolved in Eurasia, since to some degree languages develop in a similar way. We now have a decent idea of how AMH have evolved genetically since their main Out of Africa event to the present. Very roughly, we’d have two clearly different phases:
- The Upper Paleolithic one, where populations were venturing into different area in Eurasia previously uninhabited by AMH, where divergence between these populations due to mutation rates/drift and lack of contact in such a vast area is what lead the genetic development into several “very” (in the Eurasian diversity context) divergent branches that could for the most part be reconstructed in a tree-like structure.
- The second phase would be the Holocene, in which the improvement of the weather conditions and more importantly the transition to food production (agriculture) allowed for population growth and expansion. In this second phase the mutation rates and drift become an irrelevant part of genetic evolution, and instead of divergence we shift to a convergence dynamics driven by population movements and contacts (admixture either through specific migrations or just through what is sometimes -not very intuitively- called Isolation-by-Distance). Here there’s no valid tree-like structure to explain the genetic evolution. You have to track the specific population movements, their geographical location and contacts, and disentangle all the processes that led to modern populations being a big mix of UP ones. Any West Eurasian PCA will clearly show the UP/Mesolithic samples in the outside and the modern ones in the inside, surrounded by the former.
When it comes to languages we have to take these patterns into account, because they are the main driving forces in language evolution, just as they are in population genetics (and I’ll stress that this is something new in population genetics, discovered only in the last 4-5 years with the advent of ancient DNA, that forced us to throw away most of the previous population genetic beliefs – a similar revolution is awaiting for linguists when they start to catch up with all this new information).
The Nostratic hypothesis is a controversial one that postulates a macro family of languages all sharing a common origin and then diverging from each other with time. Obviously not meaning a common origin back to the main Out of Africa event (which yes, it’s likely the common origin of all Eurasian languages, but that’s way too deep in time. We even have the much more recent main “Into America” event, some 17Kya with a small population (last time I checked it was estimated at around 70 individuals) that obviously spoke one single language before expanding throughout the new continent(s) and even then it clearly doesn’t work, since there are many dozens of language families that cannot be related to each other even though we know they once were). Nostratic (at least in its modern conception, for which I’ll follow Allan Bomhard as the main proponent of the hypothesis), refers to languages of West Eurasian origin going back to the early Holocene and expanding probably with agriculture. Let’s start with this image:
If we look for example at the Eurasiatic language subfamily, we find it somewhere around 6000 BCE in Southern Central Asia and expanding from there to the NE (Altaic), NW (Uralic) and West (IE). It is indeed an interesting proposal from a theoretical point of view. But how does it stand in the new reality brought by aDNA?
First, let’s make if clear that languages which are genetically related mean that a parent language existed at some time, in some place, spoken by real people. And that the dispersal of the people (or the languages alone, without explicit migration, but by cultural diffusion) is what made them split into different languages.
So who where this proto-Eurasiatic speakers? Do we have any evidence of them being in that place at that time and expanding in those directions? For anyone who has followed the ancient DNA developments the answer is very clear: No. Such a population didn’t exist in that place at that time expanding in those directions.
I do understand that Bomhard’s proposal is just a proposal about the possible place (not so much about the time, which should be more accurate), and that his probably not strongly tied to such geographical location. But the problem is not the specific geographical location. Trying to find any other that is compatible with what we already know will yield the same result: No, it didn’t happen. It’s already difficult enough to fin a population that spoke PIE, so try to find a common ancestor of that population and the ones who would become proto-Uralic and proto-Altaic, some 2000 years earlier than whatever PIE population one postulates.
What does this mean? Simple: that such languages are not genetically related. They have different origins.
So how do we explain their similarities? THAT is a good question, and one that needs new answers. If (pre-)Altaic ultimately has an origin in, let’s say, East Eurasia (North China?) and IE (and most likely Uralic) one in West Eurasia, why did Altaic languages (and specially Turkic) come to resemble those West Eurasian ones?
Generally speaking, 2 languages can have beyond accidental similarities either by a genetic relationship or by convergence (isn’t this what happens too in population genetics?). The mechanisms of convergence are various, and more or less well known, but unfortunately not well weighted and mostly considered as “add-ons” and not a main driving force in language evolution. This is an area that needs a real breakthrough if linguistics are to catch up with reality.
If someone is jumping to the conclusion that I’m proposing that languages are basically creoles, no, that’s not what I’m proposing. Creoles are a different phenomenon that can be seen in some communities for example in North America where people mix Spanish and English while speaking, even in the same sentence. A language like English, on the other hand, is not a creole. It’s not a mix of Germanic and Latin/Romance. It’s a Germanic language, no doubt, but with strong Romance influence (Nordic too, but that’s Germanic). But what happens when we go deeper in time and influence starts to permeate some of the core vocabulary of each language? Is it still as easy to tell what is what? No, linguistic reality shows us that many times the borders get too blurred to really know (just like genetically related languages -like in America- simply become to different to be considered genetically related anymore).
And while talking about English (which is a simple example for everyone reading this), let’s take a look at it too. How did it become mutually unintelligible with Continental and Nordic Germanic and how long did it take? As for the second question it depends on how mutually intelligible Old English (7th to 11th centuries?) might have been with other Germanic speakers of the time (either -old- Frisian or Norse). If it indeed was still intelligible, then the process happened in the subsequent centuries of Middle English (12th to 14th-15th?). Whatever the case, that’s quite a fast process, and it clearly didn’t happen because of isolation and mutation rates. It happened due to strong influences from other languages with which it came in contact. Compare this to the quite exceptional case of Icelandic, one of the few languages that we can say that evolved in the “old” (Upper Paleolithic) way, by becoming geographically isolated with limited contacts with the parent language after splitting, being mutation rates its main driving force of evolution (unless I’m missing something, which could be, since I’m not well acquainted with Icelandic history).
Another interesting example could be the evolution of Romance languages from Latin. This is a question that has been disputed for probably centuries by now. The basic problem comes down to how can it be explained that Romance languages have features in common that were not present in Latin (prepositions and articles instead of the Latin endings, loos of neutral gender…). Linguists have tried to answer this with different theories, starting by the most simple one:
- The changes happened prior to the expansion of the language. Yes, simple as this explanation is, it could never stand scrutiny. Not only it would require that throughout all the Roman period Latin was a dead language only used in written documents (and not literally ones only, but administrative ones, with many people having to study a difficult language for them to be able to write it), but there’s hardly any evidence of such language (proto-Romance) in any written form of any kind (well, there are small things written about mistakes that commoners made that resemble later Romance languages and the like, but not to enter into long details about this let’s just say this disproves rather than proves the existence of such language). Moreover, we have Romanian, which does have some of the Latin features lost in the other Romance languages (like neutral gender).
- The substrates theory: basically arguing that the different substrates in the territories in which Latin expanded would explain the differences with Latin. But again, this never worked because we have a fairly good idea of the kind of languages spoken in those territories (Celtic ones, for the most part, and other non-IE ones) and they neither explain the changes nor do they correlate with Romance languages in any significant way.
- The superstrate theory: this one tries to explain the evolution of Romance due to the different Germanic tribes entering different parts of the western Roman empire. I don’t think this one deserves further comment.
- Others: like the role of the Christian church in its different territories as a driving force of the language changes/preservation and other exotic ones that I don’t even remember.
Doesn’t all this sound like someone is missing something that might be important?
This is not just theory, and to an extent not very new one, it’s the practical consequences of it that really matter. Many of these I will write about throughout future posts (this is why I called this an introduction), and some tidbits I’ve mentioned in past ones, like here about the Scythians: Now that we know the place and time of the Scythian genesis (ca. 1600-1500 in Central Asia) and their cultural and genetic relationships (mostly with BMAC, with some Siberian), we know with a high degree of certainty that at that time and place their language had to be a very early form of Indo-Iranian, that even if in the Iranian branch already (not very clear), it had to be closer to Sanskrit that to any other known language (including Avestan). We also know that they migrated west and replaced/assimilated the Srubnaya culture, who didn’t have any direct descendants (Sarmatians were just Scythians who moved west carrying their language). So this has specific consequences for the analysis of surrounding languages, chiefly Uralic ones and Balto-Slavic ones (regarding this last one, I already explained there how knowing the place and time of putative split, a genetic relationship with Indo-Iranian is inconsistent with the data, but more importantly is the fact that around 1500 years of early Indo-Iranian spoken throughout the steppe, in the neighbourhood of the people that would become Balto-Slavic cannot just be ignored in any analysis).
Similar thing with Uralic languages. Given not only the extreme difficulty of a genetic relationship with PIE, but also the difficulty of contemporary and contiguous homelands influencing each other (or at least PIE -> PU), it’s mostly later contacts that take the main role, and with whom they could have interacted and the language spoken by them is a fundamental part. In due time, when the upcoming aDNA is released, I’ll try to devote a post to Uralic in the aDNA context, because it will clarify the situation by telling us the few realistic options that are left (and it’s not surprising that CWC-related cultures are becoming increasingly popular as the preferred option for pre-Uralic, since for now it’s the only clear candidate. But I don’t want to take part in this debate for now. Let’s wait to see if there are other realistic candidates and if they’re better ones than CWC-related people).
And I can’t finish this post without referring to Basque. I will write a separate post about it in the coming weeks, but I’d like to outline here a few basic things. Basque has been a very mysterious language for linguists and historians alike. There have been so many theories about it (colourful and exotic ones included) that it would be tiresome to even name them. Fortunately, aDNA has come to the rescue leaving us with an amazingly simple scenario: There are only two realistic options for the origin of Basque:
- It was brought by Early European Farmers (EEF) during the Neolithic, ultimately from the Near East (Anatolia or beyond)
- It was brought by the Bell Beaker folk during the early Bronze Age, ultimately from the steppe/North Caucasus (more details about this here).
Technically there is a 3rd option: that it was the language of WHG, but that’s such a low probability one that does not deserve further investigation. Everything else (a late arrival from North Africa or the Eastern Mediterranean, a relict from the Cro-Magnon people, etc… is just fantasy).
(By the way, I’ll expand on the Ibero-Vasconic subject on the mentioned future post, but let’s just say here that in the recent years, thanks to some advances especially related to the numerals in Iberian, it’s become mostly accepted that we’re talking about closely related languages, in the numerals specifically more closely related than many pairs of IE languages, and likely not due to borrowing but to a deeper relationship between them – unsurprisingly).
I don’t want to debate here about the likelihood of the 2 options mentioned above, since that would bring undesired controversy to this post (I’ll do so in that next one), but let’s see a few consequences of this:
- Were (Ibero-)Vasconic languages spoken over a larger area at some point? It’s been a debated issue that no longer is debatable. The answer is very clear now: obviously yes. Or does someone think that Aquitanian fell from the sky in South Western France? No, related languages were once spoken all over Europe. There’s no way around that. And depending on who of the 2 mentioned populations brought it to Western Europe, they would have also been spoken in the Near East or in Easternmost Europe (the steppe). A different thing is if it’s possible to recover this substrate. Honestly, substrates from long gone languages are a very complicated issue. I think that once we know that such substrate existed at some point, it’s worth a collective effort from every linguist to try to find it (knowing what and where to look for), because that will help clarifying the linguistic history of Western Europe. If that will give any results is something I cannot answer. Maybe it won’t. Maybe it will. But just dismissing Vennemann’s attempt as an extravagant one is what is clearly not going to be productive or take us anywhere.
- And where are the related languages? This again depends on who brought the language to Western Europe. If it was the Bell Beakers, one should look at languages like North Caucasian ones, especially NE Caucasian (and Hurrian-Urartian), since Lezgins or Tabassarans are probably the more direct descendants of the pre-Yamnaya population from the North Caucasus steppe (we’re still waiting for Hurrian aDNA, Hurrian being a possible ancestor of NE Caucasian. Given that we know that someone brought R1b-Z2123 to the Near East starting ca. 2400-2300 BCE, and that it was not Kura Araxes, given the time and place of appearance of Hurrians they might be the best candidates for it. Still just my speculation, but if someone sees a better candidate please comment about it). Also Uralic. And more distantly other North Eurasian languages possibly influenced by the steppe (now surviving in Siberia basically). If instead it was the EEF who brought the languages, then clearly Afro-Asiatic (and specially Semitic) would be the best candidates (with Kartvelian or maybe IE -if proved not to be from the steppe-). Again, a different thing is that this will give any tangible results. Languages separated thousands of years ago in extrerme ends of the continent without post split contact between them, but very strong contacts with other languages are very difficult to relate beyond coincidental features (like, apparently, Hurrian, Uralic and Basque not allowing words starting with “r-“). Still, knowing what and where to look for, and a collective effort at it, is the only possibility to find some results. Worth a try for historical linguists.
This post is not intended in any intentional way to be controversial, nor to be exhaustive or to be specially accurate. It’s main purpose is to stress the importance of studying languages within it’s real historical and geographical context, taking into account all the different factors that influence every specific language (which are different in each case) and to move away from more idealistic and abstract analysis that have proven to bring poor and flawed results. And very specially, to take advantage of the wealth of new data coming from aDNA that puts some very tight constraints on what is possible and/or likely and what isn’t. aDNA is still a work in progress, but moving at an incredible fast pace. It has already provided amazing discoveries that change not only history, but also have important consequences for linguistics. I encourage linguists to keep a close eye on these developments, even if it forces them to abandon a line of research to which they have dedicated a long time and effort. This are times of fast change, and it’s fundamental to adapt to them and not get stuck in old theories.