Origins and spread of Indo-European languages: an alternative view

56 Comments

After over 5 years of being away without officially leaving, I’ve finally got around to write a closing post for this blog. And unsurprisingly, it deals with the Indo-European (IE) question which has been the main focus of ancient DNA studies and the subject that has brought the most interest to the people who followed them. I thought I’d never have to write this post, since back then when I stopped writing things were already clear enough and it should have been a matter of months that the mainstream publications would have written what I’m going to do now. However, 5 years down the line this is still pending and many linguists have been trusting that the current mainstream view is essentially proved and that they had to adapt their theories to those findings. When I found out this some 2-3 years ago is when I first thought to write a final post about the subject (but it’s taken a while to actually do it), and that brings me to the main purpose of this post: It’s written mostly for linguists working on the history of languages, not for people interested in ancient DNA studies. This is because the latter can already make up their own minds about the interpretation of the findings, while the former are largely dependent on the interpretation and conclusions that they are given by those writing the studies. And I think they deserve to have an alternative view before they go too far in changing their works just because they may not fit well into what they’ve been given as proven facts about the origin and spread of IE languages.

Since this is quite a big subject, I’ll be looking at each geographical area (from the west to the east) and cover the basic evidence we have from each of them before going on to look at the whole picture and a final summary. And ultimately, as it’s been the case with all the previous posts on this blog, it will be in the comments section where there will be further discussions and details about all of the things mentioned in the post, so stay tuned for those and feel free to participate in them with any questions or thoughts so that we can all get a better understanding of the data available and its interpretation.

Europe

We’ll start with this area of the IE speaking world which is probably the easiest to understand, but unfortunately the ancient DNA studies have not been able to explain it the way that both historians and linguists would require in order to understand it in a way that is useful for them. Here we’ll try to address that issue and explain the historical reality of the region on a population basis.

The Upper Paleolithic in Europe has been explained well enough, basically showing the discontinuities between different periods (Aurignacian, Gravettian, Magdalenian…). The old idea that the earliest Anatomically Modern Humans (AMH) that populated Europe are the ones from who modern Europeans mostly descend from (often Basques being cited as the most direct descendants of the Cro-Magnon people) has been thoroughly disproved. All of those early populations went extinct, and the last one to arrive did so probably just before the Last Glacial Maximum (LGM) some 25-22 thousand years ago (likely from Anatolia) and was not associated at the time with any specific culture until the Epi-Gravettian from after the LGM, mostly from Italy. It was this population the one that inhabited Europe during the Mesolithic period and have been named as Western Hunter-Gatherers (WHG).

With the advent of the Neolithic a new population started to colonise Europe (again, from Anatolia, though this time the origin is proved and not just likely) bringing farming with them. They are know as the Early European Farmers (EEF) or, sometimes, Anatolian Farmers. Their expansion throughout Europe was slow (as these were sedentary populations) but they eventually populated most of Europe replacing the Mesolithic WHG populations that preceded them. However, throughout this long period and as they further ventured deeper into Europe, the farmers did get increasing levels of admixture from those WHG populations reaching levels of up to 25% of WHG genes in their Anatolian Farmers’ genomes.

However, by the end of the Neolithic another big event happened at the population level in Europe, and it is the most crucial one for the purposes of this post and the one that has not been very well explained so far.

Depopulation and Repopulation of Northern and Western Europe

Some 10 years ago, genetic studies started to appear (Haak et al. 2015 was the first of them) showing a surprisingly large migration from the Eurasian steppe into Europe at the end of the Neolithic (starting c. 3000 BC) that changed the genetics of the European populations to formed the basis of what modern European are. These steppe migrations were associated with the Corded Ware Culture (CWC) in Northern to Central Europe, and with the Bell Beaker Culture in Western Europe. It was said that their genetic impact was roughly 50% across Northern Europe, going down to some ~30% towards Iberia. It was also stated that there was a male bias in this genetic impact, given that the Y Chromosome (passed from fathers to sons) from Europeans turned to be of a steppe origin while the mitochondrial DNA (passed from mothers to sons and daughters) was largely of Neolithic origin. Somehow, our (modern European’s) fathers came from the steppe and our mothers from Anatolia. It was also speculated the reason for this could be due to the steppe people bringing some pathogens with them that may have impacted severely the Neolithic populations they came across, with some strain of Yersinia pestis (the cause of the plague) found in steppe samples being the main suspect (link, link). This also was in line with other genetic studies (link) that had shown a big bottleneck in populations across Europe (particularly all across northern and western Europe, with Italy and the Balkans seeing a smaller one and Greece not seeing it at all, see Figure 2) at the end of the Neolithic as well as rapid expansion of a few paternal lineages (link).

Now, this may be correct (or mostly), but it fails to provide an explanation of what happened that can be easily understood by anyone confronting this information. We have to take a step back, focusing not so much on the genes but rather on the people (the communities of people) who carried those genes so that we can make better sense out of it.

During the Neolithic period, communities of people from Anatolia started to settle in Europe, advancing slowly until they occupied the majority of the European territory. They had a distinct genetic profile when compared to the WHG that lived in Europe before their arrival. This applies both to the autosomes (basically their whole genome) as well as their uniparental markers (the Y Chromosome for the paternal ones and the Mitochondrial DNA for the maternal ones). The most prevalent paternal lineages were the ones under the G2a branch. WHG, on the other hand, had most of their paternal lineages under the I2a branch. Minor paternal lineages in both populations didn’t overlap either, at least initially. However, slowly along the 4000 years between ~7000 BC and ~3000 BC, the farming communities admixed occasionally with the hunter-gatherers, which resulted in acquiring genome-wide signatures of WHG (very low in the Balkans, but increasing towards central, northern and western Europe, to around 25%) as well as uniparental markers. Interestingly, the WHG paternal lineage I2a once it entered the farmer’s gene pool, it rose in frequency to the point that by the end of the Neolithic it had become the most common one among farmers, relegating their original G2a to a second place. This pattern usually points to some sort of selection, though in this case the reason is unclear (and for the purposes of this post, irrelevant anyway).

Then around 3000 BC something happened throughout Europe, affecting specially all of the northern and western parts of it, and causing a big population collapse. The reasons for this are unknown – could be a change in the climate (the end of the warm period known as the Holocene Climate Optimum, that triggered another series of events that could include hunger due to the lack of crops, disease, increase of violent conflicts, etc…), but once again the reason is not really relevant for the purpose of this post. Suffice to say that the Neolithic population across Europe got severely decimated, with many areas becoming completely depopulated.

Meanwhile, in the North Pontic steppe small populations of pastoralists had started to thrive with their mobile economy that was not based on crops, but instead on animal husbandry. These types of populations have proven to be more resilient to the sort of changes that greatly affect the larger, more densely packed and sedentary ones that rely heavily on crops. They’re also much more mobile and can occupy the territory much faster if the conditions allow for it. And this is basically what they did when the Neolithic communities from Europe collapsed.

These steppe populations had originated probably in the North Caspian shores (maybe when they started to have domesticates in the mid-late 6th millennium BC, if not earlier) but it was not until they moved west to the North Pontic region and the conditions allowed for it (the invention of wheeled vehicles ca. 3500 BC seems to have been a crucial factor, though pulled by oxen, since they didn’t have horses as once believed) that they started to expand very successfully. The initial separation into the two main groups may have happened around that time (mid 4th mill. from some North Pontic culture like the Lower Mikhaylovka groups) to end up forming the Yamnaya Culture and the Corded Ware Culture (CWC) with the former occupying most of the steppe (specially if we include the very closely related Afanasievo Culture) and the latter expanding into what I will refer to as the Corded Ware Horizon (CWH) which would include the Bell Beaker Culture (BBC) to the west and the forest steppe cultures of the time (Fatyanovo-Balanovo, Abashevo, Sintashta, Andronovo) to the east, covering an extremely vast territory that went from Western Europe to Southern Siberia by the end of the 3rd mill. For now we’ll turn our attention to this CWH group.

The CWH people separated from the other main steppe population by heading north and leaving the steppe for the forest steppe. While the exact place and time of their initial steps is not known so far, we do know that they started to expand ca. 3000 BC reaching the Baltic Sea and moving west to Central Europe where they appear around 2800 BC. During this initial expansion, they encountered a few areas where the Neolithic populations had not died out completely. And the reason why we know this is because the steppe populations started to show admixture from those Neolithic Farmers (probably from those left from the Globular Amphora Culture) and we know that this admixture came from incorporating EEF females into their communities. We don’t know the details of how these “foreign” females were incorporated (could be from peaceful agreements, could be by force, we don’t know), nor their exact status in these steppe communities. But we do know that their offspring must have had the exact same status as the rest of the people in the community, since there’s no genetic difference across these communities, where the “foreign” genes were spreading equally among the whole community. Population growth must have been a priority for these small steppe communities, probably because the conditions they were finding allowed (and maybe demanded) such growth. They were successfully populating vast, largely depopulated areas that they could exploit and it seems that whenever they had the chance to incorporate females from the few Neolithic communities they found along the way, they did so in order to increase the growth rate. Males, on the other hand, didn’t seem to have been welcome, probably due to the patriarchal nature of these steppe people that organised themselves in family clans (much like the late Neolithic farmers from Europe did too). The evidence for this dynamic of incorporating females but not males is very clearly seen by looking at the uniparental markers, where we do see European Neolithic haplogroups in their mitochondrial DNA, but not a single European Neolichic haplogroup in their Y chromosome, and then at their autosomes which show the genome-wide admixture they were getting via these females.

By the time they reached Central Europe around 2800 BC, the CWH people had around 30% admixture from the Neolithic farmers. Quite a significant amount, but not surprising given how fast a small population can change genetically when they start incorporating “foreign” genes into their pool. Then during their stay in Central Europe, this admixture increased to around 50% by ca. 2500 BC (which means that they still found some Neolithic communities that survived there and from which they could incorporate females). However, one may wonder what was happening in those Neolithic communities meanwhile. That’s something we don’t really know. We don’t have a single sample in the ancient DNA record from the Neolithic communities from the periods just before, during or after the arrival of the steppe communities. The only evidence we have that some of them survived the collapse comes precisely from the admixture that we see in the steppe communities that were occupying their former territories. So, essentially, a few of the Neolithic communities lived just long enough to see the steppe ones arriving and acquiring females from their communities before they died out completely (we don’t know if it was this “borrowing” of females from the steppe groups what precipitated their final extinction, though that’s a possibility even if the “borrowing” of females didn’t imply any violence).

Then from Central Europe, around 2500 BC, these steppe communities continued their expansion to Western Europe. We know the communities that did so as the Bell Beaker Culture (BBC), but they were the same people. Curiously, this expansion to Western Europe started from a very small clan within the CWC people. And we know this because they had a Y Chromosome haplogroup that was very rare among the CWC (the vast majority of males from the CWC had a subclade of the R1a branch, while the males from the BBC had one from the R1b branch). This also stresses how small the initial population that repopulated Western Europe must have been. Essentially a small family clan that once they settled in Central Europe started to be successful and then went on to occupy the whole of Western Europe. This, again, was facilitated by the fact that most of Western Europe had become almost completely depopulated. For example, the BBC people who colonised the British Islands were genetically identical to how they were already in Central Europe. In other words, on their way to the Islands and on the Islands themselves, they didn’t seem to have found any females from surviving Neolithic groups to incorporate into their own communities and grow faster. Here, both the lack of direct evidence of any surviving Neolithic community as well as the indirect one from no traces of admixture in the steppe populations that moved across that territory indicate that it was almost completely (if not completely) depopulated.

On their way to the Iberian peninsula and in the peninsula itself, however, they did find some surviving Neolithic communities as again we see further admixture coming from the “foreign” females they were incorporating into their own communities. By the time they had settled the Iberian peninsula, this admixture had increased to around 70%. But again, we have no direct evidence of these surviving Neolithic communities from the time when the steppe people arrived. It’s just the indirect one (in the form of admixture in steppe populations) that allows us to know that they must have been there, even if it was quite shortly once the steppe people arrived (if the arrival of the steppe people is what precipitated their extinction is something that, once more, we don’t really know – but it seems plausible).

What all the process described in the above paragraphs basically means is that Northern and Western Europe were completely (re)populated by people who came from the steppe. By communities, clans, of people that came from the steppe. This was not a 50% replacement of the previous Neolithic population. It was a 100% replacement. Every single Neolithic community died out before or at the time the steppe communities arrived. The fact that some (or many) of the genes from EEF survived (through those females that were incorporated into the steppe communities and passed their genes along) does not have any historical (and therefor linguistic) relevance. The people, the communities of people with their culture and language, that populated all of these parts of Europe were originally from the steppe. All of them. We don’t have evidence of even a single exception. The paternal lineages from the Neolithic people disappeared simply because the Neolithic communities of people disappeared.

Thus, after this expansion throughout the 3rd mill., we have the CWH people all the way from Western Europe to the Altai Mountains of South Siberia. And they were the sole occupants of all that area. Basically a big family very closely related to each other and without any discontinuity in their occupied territory. Which means, clearly,  that they all spoke the same language, and probably that the divergence between the language spoken by someone in Ireland or Iberia (BBC) and someone in Southern Siberia (Sintashta-Andronovo cultures) ca. 2000 BC was not very large. Which takes us to the next question about which language was that.

Mainstream studies have been suggesting that the CWC must have spoken something they called “Indo-Slavic”, i.e, and Indo-European language from which both Balto-Slavic and Indo-Iranian languages descended from. But that would imply that such language was also spoken throughout Western Europe, something that we don’t have any evidence of whatsoever. Moreover, it would imply that Celtic and Italic would be descendants of Indo-Slavic, something that is at odds with basic linguistics.

Therefor, it would be better to suggest that they spoke an older form on Indo-European language from which all others descended from (except the Anatolian branch, and maybe Tocharian). The only problem is that not only we don’t have any evidence to support this, but that all the evidence we have contradicts this idea.

To examine this, we should start from the easiest place: The Iberian peninsula, where we have earlier evidence of languages that in the rest of the territory of the CWH, and that being rather isolated in the far west is free of confounding factors. And when we look at the earliest languages known from there, we see that the languages spoken that were not replaced by the recent (at the time of the recorded languages) Celtic expansion were non-Indo-European. I already wrote a few years back some insights about the languages of Iberia, looking at the relationship between Basque and Iberian, as well as to the substrates. There I presented some of the latest linguistic research (which was, and probably still is, only available in Spanish) showing the shift in the paradigm that used to consider the relationship between Basque and Iberian a sort of a legend to become the most accepted idea that they are indeed family related. I also explained that one of the obstacles that this possible relationship had to overcome was the believe that the Basque and the Iberian people were completely unrelated, with Basques being descendants of the first European AMHs and Iberians being a Mediterranean population. This problem is not only solved now, but the fact that we now know that Basques and Iberians were the exact same people who arrived shortly after 2500 BC and settled the whole peninsula (without any of the Neolithic populations that lived there before surviving) actually makes it almost impossible to argue that they could speak different (unrelated) languages. This is one of those cases where ancient DNA has come at the right time to confirm without a doubt the recent (and at the time slightly controversial) linguistic research. (As a side note, when talking about Iberia I don’t refer to Tartessian because of it’s unclear classification, with the only possibilities being that it was either a Celtic language or, more likely, a form of Iberian).

When it came to substrates, I pointed out how the ancient DNA evidence had disproved a line of research that had become very popular and accepted: the Indo-European substrate throughout the Iberian peninsula, specially strong in areas (south, east and even the Basque Country itself) where non-Indo-European languages were spoken at the time of our first records. This theory was championed by the prominent linguist Francisco Villar, who was finding Indo-European substrates everywhere, but he was very adamant in pointing out that they were non-Celtic and non-Italic (obviously neither Indo-Slavic, just “unknown” IE). This was all a way to prove the Paleolithic Continuity Theory, and it had several followers who contributed to it. In that mentioned article, I looked at one study (in English) by Leonard A. Curchin where he goes through the substrate in Catalonia (an Iberian speaking region) where he finds that 50% of it comes from that non-Celtic, non-Italic IE branch (in contrast, he only finds 10% of the substrate to be Iberian). The confirmation that this theory cannot be correct has significant implications, since the reason why that 50% substrate was considered IE was none other than the fact that it was found in many other parts of Europe (where Iberian could have never been spoken, according to the tradition). This brings us to the next point, which is the large amount of non-IE words incorporated into the reconstructed PIE. This heavily “vasconised” (from Vasconic) reconstruction of PIE has also been found by some researcher based on statistical analysis (I can’t comment on the validity of the method used, but somehow the result seems to be correct, even if by chance):

The new surprise is that PIE, as usually reconstructed, appears to be a sister-language of Basque, in complete breakaway from Hittite. Amazingly, PIE would be as close to Basque as the North Caucasic languages are close to each other. This clearly shows that PIE, as usually reconstructed, must be seriously erroneous and contains plenty of substratic Paleo-European words, that drag the general picture away from Hittite and closer to Basque.A lexico-statistical comparison of Basque, Arnaud Fournet (draft, 2018).

A related phenomenon was found by Ranko Matasović when looking at the substrate in Balto-Slavic, noticing a common substrate in Northern and Western European IE languages not present in SE European ones:

“This paper presents an analysis of those words, attested in Balto-Slavic, that do not have a clear Indo-European etymology and that could have been borrowed from some substratum language. It is shown that Balto-Slavic shares most of those words with other Indo-European languages of Northern and Western Europe (especially with Germanic), while lexical parallels in languages of Southern Europe (Greek and Albanian) are much less numerous.” Ranko Matasović, Substratum words in Balto-Slavic,  2013.

When we look at modern Basque, we see that it’s absolutely full of Latin/Romance loanwords, which is expected given the last 2000 years of history, while it has very few Celtic ones (also expected, since their resistance to the Celtic expansion must have made them enemies and limited their contacts during the several centuries or neighbourhood), but there’s not trace of the old IE language that the CWH people would have spoken during the previous 2000 years to the arrival of Celtic.

Looking outside of Iberia we keep finding problems that can’t be explained if the CWH people had spoken an IE language. A non-Indo-European substrate in insular Celtic (usually considered either Afro-Asiatic -which now we know can’t be correct- or Vasconic) wouldn’t make any sense. As it wouldn’t make any sense for Germanic to be the least Indo-European of all the known IE branches at its core. There is a clear necessity for Northern and Western Europe to have a non-IE substrate, and and even more clear necessity to have a source for the non-IE languages attested. For a substrate, you need longstanding interaction between locals and migrants, with locals (usually the majority of the population) switching gradually to the language of the incoming people, first as a second language and eventually as the only one. This didn’t happen here, since interactions between locals and incoming people were from very short to non-existent depending on the place, and no local population switched to the language of the migrating one because, quite simply, no local populations survived.

In summary:

  • Northern and Western Europe experienced a population collapse at the end of the Neolithic (starting around 3000 BC and finishing around 2300 BC in some southern areas of Iberia).
  • Populations from the steppe (CWC and BBC, who were the same people) repopulated all of Northern and Western Europe. A 100% population turnover.
  • These populations from the steppe came from a small group initially, so they all had to share the same language.
  • That language had to be non-IE according to all the evidence we have.

However, since the good thing when it comes to both IE and whatever language was spoken by the CWH -I will refer to the latter, due to its geographical and temporal location as North Eurasian Bronze Age (NEBA) language family from now on- the areas covered are very large, we will go through the rest of them to confront what I’ve proposed here with the data we have from the rest of the areas.

Italy

In contrast to Northern and Western Europe, Italy didn’t experienced a complete collapse of the Neolithic population. It’s likely that several areas got severely decimated or even completely depopulated, but Neolithic communities still persisted during and after the arrival of the people from the steppe. Therefor, the picture we have is quite different, with two populations of different origin inhabiting the area during the Bronze Age.

From a linguistic point of view this would mean that two language families may have been used along the Bronze Age, one from the EFF (unknown family) and the other one from the CWH (NEBA language). The picture we get by the Iron Age when we start to have evidence of the languages spoken in continental and peninsular Italy is analogous to what we see in Iberia: All the populations that didn’t switch to the recently arrived Celtic and Italic languages spoke a non-IE one. We don’t have any traces of an Indo-Slavic language or any other old form of IE that could be attributed to an arrival ca. 2500 BC.

Looking at the genetics, we have samples from Etruscan and Italic Speakers from Central Italy and they are both more or less identical and both largely descend from the CWH people (not 100% as in Northern and Western Europe, since in Italy they did admix further with the EEF that lived on along the Bronze Age). In other words, while no conclusive evidence can be learned from Italy alone, it’s all compatible with what we’ve seen in the previous section. To clarify, the Etruscan language itself could either come from the CWH (more likely) or from the EEF (less likely, but perfectly possible). This is ultimately a linguistic problem. (NOTE: As I was writing this, a new study with samples from Iron Age Picenes from Novilara and Pesaro -North Picene speakers, a poorly attested and controversial language- has been published. No surprises, as the samples resemble the above mentioned ones being largely of steppe origin).

As a side note, and for the sake of completeness, a short note about Sardinia. Modern Sardinians are outliers among the European populations in that they derive most of their ancestry from the EEF that colonised Europe from Anatolia during the Neolithic. However, ancient DNA does not show a complete continuity since the Neolithic. We have samples from the Bronze Age that have steppe origins. The contacts between Sardinia and the Mediterranean coasts of Iberia, France and Italy is then proved by these samples, though even without them it would still be reasonable to think that there were longstanding contacts between Sardinia and those other areas that were inhabited by CWH people. Therefor, it would be a mistake to assume that Paleo-Sardinian must be a language that came from EEF based on the modern DNA. It may well be from that source, but it may as well be a NEBA language borrowed from the neighbouring regions of mainland Europe. Once more, this is just a linguistic problem since DNA allows for both options to be possible.

South Eastern Europe

Unlike the rest of Europe, the Balkans didn’t see any migration from the CWH people. Instead, it was the sister branch, the Yamnaya people, who moved into the Balkans in the period from ca. 3200 BC to 2500 BC. As in Italy, the Balkans didn’t see a full collapse of the Neolithic populations, but probably the northern parts of it did see a significant decimation in the Neolithic people that facilitated the arrival of steppe populations. The southern parts (modern day Greece) remained fully populated by its Neolithic inhabitants along the 3rd millennium.

It’s hard to estimate accurately the impact of the steppe migrations in the Balkans due to not having enough samples so far, but in general we can say that it was significant but relatively modest compared to the rest of Europe. After 2500 BC, it’s likely that no new migrations occurred from the steppe, and the steppe people who were already in the Balkans must have started to mix with the local populations (more on this later).

From a linguistic point of view, what is remarkable at first sight is that we don’t have any surviving non-IE language in mainland SEE, even though it’s the area where languages could be attested earliest compared to the rest of Europe. And the better explanation for this is the fact that Indo-European speakers entered SEE at an earlier date, replacing the languages from both EEF and Yamnaya people before the Iron Age.

We are still missing the direct evidence from the critical samples, but we’ve had the indirect evidence for quite a while. Let’s look at the details.

Indo-European populations started to enter SEE Europe during the period from 2400-2000 BC. They came from West Asia (North West Anatolia was the immediate origin, but ultimately their origin had to be deeper into West Asia, around the South Caucasus) and settled the area of Thrace during this period. We don’t have the direct genetic evidence of this, since we simply lack any samples from this place and time, so I’ll quote from a relevant paper about the archaeological side of it:

“So, while the first half of the 3rd millennium BC in Thrace is characterised by a (comparatively) moderate level of social and economic complexity and the ideological dominance of pastoral tribes of a north-Pontic origin, there is a real explosion in complexity in the period between 2400 and 2000 BC and the region becomes increasingly included within a much wider network that is now dominated by frequent and highly visible exchange and trade, and new forms of prestige and status expression”

“The same conclusion of the existence of foreigners is also indicated by the use of many exotic and prestigious objects, often made of silver. This metal was not readily available in EBA Thrace. We can also note that tin-bronzes may have arrived into this region via Anatolia rather than Europe […] and it is difficult to imagine how such a quantity and quality, and the imaginations and customs behind these, can be transferred to Europe without having individuals or groups of people carrying them, and the infrastructure to organise their transport and wider distribution”

“There can be no doubt that the driving force behind this influx of goods and people is enhanced exchange and organised trade, and it is in no way an accident that concurrently the largest exchange network the world had seen up until then arrived at its peak. This network was centred in southern Mesopotamia, a region that had been fully urbanised for at least a millennium, and it stretched from as far away as western India on one side to southeast Europe on the other, and it also incorporated large parts of Central Asia”

Kanlıgeçit – Selimpaşa – Mikhalich and the Question of Anatolian Colonies in Early Bronze Age Southeast Europe, Heyd et al. 2016.

Now we’ll have to look at some genetic details from Greece in order to see how this may be reflected on the ancient DNA that we have available. As mentioned earlier, Greece didn’t see a population collapse in the period around 3000-2500 BC. There was a continuity since the early neolithic until after 2500 BC (just small amounts of ongoing genetic exchange with neighbouring regions, but nothing remarkable about it). The steppe population that moved through the Balkans during the EBA didn’t reach Greece during that period. It was once they settled and admixed with local populations from the Balkans when we first see an intrusion into Greek territory in the last part of the 3rd mill. To see the sequence of events, we’ll start by looking at 4 samples labelled as Greece_Perachora_BA (G31, G62, G65 and G76a) dated 2700-2200 BC:

To understand what this shows: In the columns there are sampled populations from different locations and periods. In this case the first two columns (after the initial one with the target samples from Greece mentioned above) represent samples from Bulgaria Chalcolithic (BGR_C) and from Greece Neolithic (GRC_Peloponnese_N), and they are supposed to represent the Neolithic/Chalcolithic population from the Balkans. The next tree columns represent West Asian populations (the Kura-Araxes Bronze Age culture from the South Caucasus with samples from what is today Armenia, then samples from the Levant Early Neolithic, if I remember correctly from what is today Israel, and finally samples from Central Anatolia Chalcolithic). The last column are samples from the Yamnaya culture from the steppe, from around 3000-2500 BC.

In the rows we have the four samples from Greece (Perachora, Bronze Age) mentioned above. And what we see is that they can be mostly modelled (97.2% average) with the first two columns representing local populations from the Balkans Neolithic/Chalcolithic. There’s only a 2.5% of West Asian admixture over whatever was already there in the Neolithic/Chalcolithic (which wasn’t much) and the 0.4% from the steppe is within the noise levels, so basically nothing at all.

However, during the period from 2300-1900 BC we have a few samples that are clearly different:

These samples derive two thirds of their ancestry from the Balkans Neolithic/Chalcolithic, and the other third from the steppe. We don’t know from where these samples may have come from, but probably from the Western Balkans there steppe admixture was higher.

However, this was not the last movement of populations into Greece. Here we have some groups of Mycenaean samples from 1600-1200 BC:

Here we see that Mycenaean Greeks have 20% ancestry from West Asia that was not present before their arrival, indicating a very significant change in the population somewhere between 1900 BC and 1600 BC. This Mycenaean type of ancestry is the one that persisted during the classical period, as we can see from these other two samples from the Greek colony in North East Iberia of Empuries, dating one to around 750-400 BC and the other one around 350-200 BC:

Note that the above samples were outliers among the ones from that colony, where the other were local Iberians that are very different as can be seen below:

Since we are missing samples from South East Europe from the period around 2400-2000 BC it’s difficult to pinpoint the exact origin of the Mycenaean people, but it had to be somewhere around Thrace or North West Anatolia. Once we get samples from that time and place, we’ll also be able to better asses their origin within West Asia. But since we know that the largest part of Anatolia was settled by speakers of the Anatolian branch of IE languages, it seems necessary that the origin was beyond Anatolia, with the South Caucasus being the most likely place.

A last note for completeness about Crete. There the West Asian admixture arrived earlier than in mainland Greece, and its likely source was South East Anatolia. This leaves us with two options about the affiliation of the Minoan language: it could either come from the local Neolithic inhabitants (EEF) which would basically make it an isolated language, or it could come from the Anatolian side and be an IE language of the Anatolian branch. There’s no evidence that it could be related to Greek itself. For a reference, here’s how they look:

And with this we’ll leave Europe for now (more later) and move on to Asia.

Asia

Anatolia

Anatolia was the origin of the Neolithic population of Europe, as mentioned. In the early neolithic, they had their characteristic genetic signature, but as time passed there was a significant mixing among West Asian populations that made all of them get admixture from the others. Since Anatolia is at the west end, that admixture was mainly from the east (South Caucasus/North Mesopotamia and beyond), and from the south (Levant). This makes it a bit more difficult to distinguish migrations between these areas, since we need enough resolution to see a significant change in a short period of time in a specific place to know that there was a migration and not just the ongoing general admixture that was happening all the time. The increase in admixture from the South Caucasus from the Neolithic to the Chalcolithic is evident and can perfectly justify the arrival of IE languages from the east (though let’s remind ourselves that a migration is not always necessary for the spread of a language, nor does a migration guarantee a language shift unless it’s a complete replacement as seen in Europe). We’d just need a higher resolution to find the specifics that might have brought the IE language from the Caucasus to Anatolia in the period around 4000-3500 BC.

Above, two Neolithic populations from around Central Anatolia. Below two Late Chalcolithic ones:

The shift to the “east” (more admixture from South Caucasus, less from Western Anatolia) is very clear, but this is very general and we’d need more detailed data to pinpoint a putative IE arrival.

In any case, the last publication (preprint) from one of the main teams doing this research already went with the hypothesis that the IE languages arrived to Anatolia from the South Caucasus, which should be correct, so I don’t think I should extend any further about this point.

South Caucasus

Here is where my views diverge from the above mentioned study. The reasons should be obvious already, since in that paper they argue that PIE (what they call Indo-Anatolian) originated in the North Caucasus/Lower Volga area, and from there it crossed to the South Caucasus from where it went to Anatolia. They need this scenario because they still argue that the steppe populations (Yamnaya and CWH) were the ones that spread the rest of the IE languages (all except the Anatolian branch), while I’ve been arguing so far that those steppe populations spread non-IE languages that I’ve referred to as NEBA languages. Apart from the fact that the European linguistic reality requires a non-IE substrate, not to mention a source for the known non-IE languages, the probability of the Chalcolithic societies from the South Caucasus to have adopted the language of the incipient pastoralists of the steppe Eneolithic is not very plausible. It would have been much more likely to go the other way, but for what we know it didn’t, and the steppe pastoralists kept their original language (at least at this stage – more on this later).

My preferred view about the arrival of IE languages to the South Caucasus is that they did so from the east. Reading several papers about the archaeology of the South Caucasus some years back, there was a clear suggestion that new people started to arrive there around 4200 BC, and these people were the ones who later formed the Kura-Araxes Culture (which is more commonly dated to start around 3700 BC – this probably because this was a migration that was slow and lasted a few centuries). The origin was unknown. However, we’ve been lucky to get some of those early samples from around 4200-4000 BC from Armenia (Areni Cave) and they are in fact considered as part of the Kura-Araxes Culture despite their early dates. Coincidentally, it’s those same samples that the latest study mentioned above choose as the earliest IE speakers in the South Caucasus, arguing that they came from North of the Caucasus since they have steppe admixture. However, those samples also have admixture from the east (though I’d also say those samples are quite strange in their genetic profile and difficult to analyse), and crucially they happen to carry a strange male lineage (the Y chromosome haplogroup L1a) which is quite rare, but clearly came from much further east and not from the steppe. Later samples are more clear in their autosomal profile, so as an illustration here are the oldest 3 samples (other than those from the 5th mill. from the Areni Cave) from the Kura Araxes Culture, dated to the late 4th mill. (3350-3000 BC) as well as the 3 oldest from the Maykop Culture from the North West Caucasus, also from the 4th mill. (3375-3500 BC):

As seen the largest part is still local, but there are some significant contributions from the north (represented by some samples from the steppe north of the Caucasus mountains from around 4200 BC) and from the east (represented by some samples from Turkmenistan -Geoksiur- Neolithic).

I said above that an arrival (of IE languages to the Caucasus) from the east would be my preferred scenario because I don’t consider it completely necessary. The alternative would be that the South Caucasus was already part of the pre-IE speaking area since the Neolithic, but that would make for a larger PIE homeland which is less parsimonious from a linguistic point of view.

The second matter I want to examine from this area is a hypothesis that if correct it would be important, not so much for the IE languages (though it would help convince some sceptics), but mostly for the NEBA languages. It’s the origin of the Hurrians.

Hurrians from the steppe?

I’ll start by looking at some linguistic considerations that first brought my attention to this topic. For a long time, linguists have tried to find the origin of the Basque language, or at least to find some other language related to it. The most recurring suggestions have always liked it to the Caucasus languages, and more specifically to the North East Caucasus ones. This, of course, was a very controversial hypothesis, given the distance between the Basque Country and the Caucasus, together with the lack of any plausible connection from a cultural or population level. As an example of this hypothesis, here’s a quote from one of its more recent and prominent proponents, John D. Bengtson, from his book “Basque and its closest relatives: A new paradigm“:

“In direct contradiction of these kinds of statements [the uniqueness of Basque], the thesis of this book is that Basque is demonstrably related to other languages, i.e., that a scientific analysis of the evidence leads to the most probable conclusion that Basque is, at first remove, most closely related to the North Caucasian language family.

However, with all the data that we now have, a connection between the Basque Country and the North Caucasus has become much easier to explain, given that Basques, just as all the rest of Western and North Europeans came from the steppe and that the North Caucasus is just bordering the steppe from which they came from. Everything indicates too that Basque is indeed a relict from the languages spoken by the CWH people who settled most of Europe around 3000-2500 BC, and North Caucasians (and specially NE Caucasians) are the modern population that’s genetically closest to the original steppe people (like the Yamnaya people), while the Caucasus mountains are an area where their language could have survived more easily once the IE languages replaced it throughout the steppe.

The next link in this chain is the fact that those looking for the origin of North (especially NE) Caucasian languages have found Hurrian and Urartian as the most likely ancestors. While I can’t asses any of this from a linguistic point of view, I’d like to look at the genetic evidence that we have and could help solve these questions.

We know more or less (indirectly) that people from the steppe started to cross the Caucasus around the second half of the 3rd mill. during the late Yamnaya period or early Catacomb one (the Catacomb Culture people were a continuation of the Yamnaya people). We more or less know that horses were domesticated around the middle of the 3rd mill. in the steppe, somewhere between the Caspian and the Black Sea (link). And these horses must have started to be traded across the Caucasus shortly after (the earliest sample of a domestic horse of the modern type that we have comes from Anatolia ca. 2100 BC). Whether the domestication of the horse and its trade was the reason why people from the steppe started to venture into West Asia is unclear, but it probably helped that the trade was established.

The oldest references to Hurrians that we have date to around that period (they were established in North Mesopotamia around 2250 BC). Their strong connection to horses is well known:

It seems that one of the first important results of the Mozan/Urkesh excavations, at least from the point of view of Indo-European studies, was the discovery of a beautiful sculptural image of a horse head dating from the middle of the third millenium B.C. From much later representations of horses, possibly continuing the same Hurro-Urartian tradition, one may particularly compare a bronze horse head from Karmir-Blur (VIII c. B.C.). Subsequent findings in Mozan/Urkesh have shown a number of horse figurines coming from the storeroom of Tupkish’s palace (about 2200 B.C.), some of which represent the domesticated animal. These numerous figurines, which belong to the following period of the history of Urkesh in the last quarter of the III mil. B.C., make it clear that the horse was extremely important in the life of the society. Particularly interesting seem horse figurines showing the harness, thus documenting the use of horses in transportation.Horse Symbols and the Name of the Horse in Hurrian, Vyacheslav V. Ivanov, 1998.

From the point of view of ancient DNA, we have some interesting clues so far. The first one comes from a site in the Levant, Tel Megiddo, in modern day Israel. During the mid 2dn mill. this area is said to have had a significant Hurrian population, and apparently Tel Megiddo itself had a king with a Hurrian name. We have many samples from this site, and all of them are of local origin except 3 outliers (two of them are brother and sister, so grouped as one, and dated to 1600-1500 BC, while the third one is dated to 1688-1535 cal BCE). This is how the local samples from the same period look like:

And this is how the outliers look like:

Clearly, these outliers had steppe origins, with the brother (the only male) probably having the typical paternal lineage of the Yamnaya people (but due to low resolution in the Y chromosome we don’t know for sure since it’s just labelled as R without the subclade). Of course, we don’t know if these outliers were Hurrians or not, but given the historical knowledge it seems more likely that they were indeed Hurrians rather than some random travellers.

The second clue comes from later Hurrian and Urartian samples, which are already from ca. 1000 BC and later and their steppe ancestry has greatly diluted, but the males remain having largely the Yamnaya paternal lineage.

None of these clues alone can tell us if Hurrians came from the steppe, but together they do make for a compelling case. Ultimately, we’ll need to wait for samples from early Hurrians (pre-2000 BC ideally) to know with certainty. However, things may become a bit more complicated when we take a look again at a possible role of the Yamnaya population from the steppe when we get back to Europe.

Central Asia and North India

Finally we get to the last area that is relevant for the IE question. When it comes to Central Asia, we have to divide it into North (mostly Kazakhstan), which was part of the steppe and was settled by the CWH people around 2000-1400 BC with the Andronovo Culture, South (Turkmenistan, Uzbekistan and Tajikistan, which we will refer to as Turan, following the literature published about it), which had a local population dating back to the early neolithic period and the eastern edge (Tajikistan, Kyrgyzstan and SE Kazakhstan and till the Altai Mountains) that we will refer as the Inner Asia Mountain Corridor (IAMC) which has its own distinct population from the Paleo-Mesolithic period.

The period between 2000-1500 BC is the critical one when it comes to asses the linguistic side of things since during that period we have the different populations from Central Asia, plus the population of North India, plus a population that reached the Near East (the Mitanni), all speaking the same language: an early form of Indo-Iranian that was close to Sanskrit (Sanskrit itself being the form spoken in North India at the time). This means that since the population from the steppe had just arrived to the area from the west, either they switched to the language spoken in those other places during the 2000-1500 BC period, or that they managed to spread their own language to all of those places during that same period of time. The most accepted traditional view has been that the latter is what happened. Here instead, we will explain that the former is the scenario that is compatible with all the data that we have.

With regards to genetics, it’s relatively simple. What we see is that during that period of 2000-1500 BC there is a low level admixture in both populations of north (steppe) and south (Turan) from each other. This was largely mediated via females, since the male lineages largely remain unchanged in both of them. Basically, there’s really not much in the genetics that would suggest a language shift from any of them, though there is enough to see that they were in contact and therefor a language transfer is compatible with the data. But this had to be more due to the cultural exchange than to actual migrations. Here are the samples we have from Turan from that period (minus two outliers from Bustan looked to come from the South Caucasus). The earliest we have from after 2000 BC is dated to 1650 BC and they go down to 1250 BC:

Meanwhile, the steppe populations during that same period were much more diverse (it’s a much larger area too), with some complex admixture in many individuals, while others stayed much more unadmixed as seen in the two figures below:

The archaeology in which the traditional view of Indo-Iranians being originally from the steppe is based is now mostly outdated. For example, Elena Kuzmina considered that “The Andronovo provenance of the fire-cult and the cremation rite is beyond dispute” (The Origin of Indo-Iranians, 2007). And goes on to remark the importance of it for the spread of Indo-Iranian from the steppes to the south:

Northern Bactria provides a unique opportunity to trace the southward migrational process of the Andronovo population and its assimilation with the locals. Since the material culture of the aborigines was highly developed and adapted to the ecological environment, the newcomers adopted in its entirety the complex of their material culture, while retaining their ethnical distinction in the most important sphere—ideology: in the cults and burial rite. As is well known, the principle condition for maintaining ideology in traditional culture is the preservation of the language which conveys mythological concepts and ritual texts. […] Since in the assimilation process in northern Bactria it was the ideological concepts of the Andronovans that took the upper hand, it means that their language conveying ideology and ritual activity became the winner too.

However, since then, it has been found that the cremation and fire cult have clear antecedents in the population from the IAMC, at sites like Begash and Tasbas, As David Anthony has already pointed out:

The pre-Andronovo mortuary custom of cremation documented at Tasbas and Begash continued into the Andronovo period as a distinctive trait of Fedorovo mortuary rituals in the Tien Shan region but with the addition of a kurgan, stone fences, and other Andronovo traits absent from the Begash Ia and Tasbas level 1 mortuary customs.Samara Valley Project and evolution of pastoral economies in Eurasian steppe (2016).

Moreover, the archaeology that Kuzmina cites for the expansion of the Indo-Iranians to the south (from the steppe) is dated to very late layers of the sites she mentions, like Bustan or Dzharkutan, where steppe finds are in the layers from around 1000 BC which is 1000 years too late for the spread of Indo-Iranian (the samples we have from those sites that date to the period from 1650-1250 BC are local people, with the slight steppe admixture as seen above). She also refers to the light skin and eyes of some modern) populations of North India/Pakistan as a proof of the steppe origin of their language, which is something irrelevant for many reasons that I won’t extend here about.

Basically no evidence at all for the sort of huge events that should have happened in order for the Indo-Iranian languages to spread from the steppe to such a big area in such a short period of time. Nor any evidence that the people from the steppe could have spoken an IE language in the first place (quite the contrary, as already seen from other areas). Instead, we have a much easier explanation for the steppe populations to have acquired the Indo-Iranian language from their southern neighbours, along with much of the culture, technology, rituals and economy (for the change in the economy of the steppe population before and after the contact with the populations of Turan and IAMC, a graph (figure 16.12 here) from David Anthony’s “The Horse, the Wheel and Language” (2007) is quite revealing, showing the change of diet from an animal based to a mixed one.

When it comes to India, unfortunately the ancient DNA record is almost completely missing. Very few samples (to my knowledge) have been analysed so far and none of them published. But the DNA we have from the surrounding areas already tells us with high confidence how the early Vedic people should look like: Basically just like their predecessors from the Indus Valley Civilization. We don’t have direct samples from the latter either (except one of very low quality that was published years ago), but we have outliers from the surroundings that clearly had an Indian origin (known as Indus Periphery samples). The ones from the Indus Valley itself should look similar but with a significantly higher proportion of the specific Indian signature, usually referred to as Ancient South Indian (ASI or AASI). And indeed, the unpublished samples from the core Vedic area dating to the mid 2nd mill. (late Rigvedic period) are, as far as I know, exactly like that. But we still have to wait for samples to be published in order to be certain about it.

Some of the genetic remarks in the literature that suggest that Indic speakers came from the steppe are based on modern DNA, and as in the case of Kuzmina’s mention to the light skin and eyes of modern Dardic and Nuristani people I won’t comment of the details of why they are irrelevant. Overall, the hypothesis of Indo-Iranian languages reaching India from the steppe is simply not possible with the current data available. If some surprising evidence emerges at some point we could revisit the subject, but for now there’s not much more to say about it.

Now let’s briefly mention the Mitanni people that moved to the Near East in the 2nd mill BC. They have been usually considered an Indo-Aryan population (rather than Iranian), but that’s just because at the time they started to move to the west (likely around 1900 BC or slightly later), Proto-Indo-Iranian (PII) was just starting to break up and all the dialects from that time are similar to Sanskrit. The Mitanni Kingdom itself is first mentioned around 1550 BC, but the people must have started to arrive (from Turan) quite a bit earlier. We lack Mitanni samples so far, and the closest we have is an outlier from the site of Alalakh, in the Levant, dating to ca. 1550 BC which has a clear origin in Turan. But of course, we don’t know if it’s a Mitanni sample or not. However, given the origin of the Mitanni and their language, they should all look the same to that sample, i.e, like all other samples from Turan (though as time passes, with local admixture, obviously, like the one shown by the later Iron Age samples from Ascalon in the Southern Levant included below too, dating to around 1200-1100 BC). Once more, we’ll have to wait for more relevant samples to confirm this.

With the above said, the question still remains as to where was the origin of Indo-Iranian. And in my opinion the only way to explain the successful spread of the language is consider that Proto-Indo-Iranian became a prestige language and eventually a lingua franca during the mature period of the IVC and BMAC which would be around 2500-2000 BC. There seems to be no other way that can easily explain the fact that this language was spoken in both places at the same time. We do know that these two civilizations had intensive contacts, so it seems reasonable to think that during the peak of their development and trade, they established a common language that became the language of all the people in those areas, as well as those in contact with them. Whether the original pre-Indo-Iranian was spoken in one place or the other is something that would be quite more difficult to asses, so I won’t get into it. After the collapse of these two civilizations, the language must have started to break up, but we still know that during the period immediately after 2000-1500 BC they all must have been quite similar (Sanskrit in North India, Mitanni in the Near East and the language of the early Scythians on the steppe).

The big picture

The devil is in the details, they say, so we’ve first gone through the most important ones of each area. Now it’s time to step back and look at the big picture:

Approximate extension of steppe populations and Indo-European languages c. 2000 BC

Notice that the above map is not intended to be accurate in the details, but just to give a broad approximation. For the steppe populations, the dotted areas represent where they were alongside local populations, while the solid area is where they were the only population living in that huge area and speaking their native (NEBA) language.

The PIE homeland

From all of what we’ve commented so far, as well as from the map above, it may be clear to anyone that’s gotten this far that the PIE homeland must be placed in North Iran and Turan. The two main factors that make it necessary to place it there are the presence of IE languages in India and the Tocharian language in Xinjiang (China). From further west, those two things would be too difficult to explain.

The origin of the language must have been in the South Caspian area, from where it went with the Neolithic to Turan. These areas must have spoken pre-IE from the early Neolithic. PIE would be the phase prior to it’s expansion outside of that homeland, which would be close to 4500 BC. We lack ancient DNA from India to know the date or arrival of the population that formed the North Indian one, but certain anthropological studies suggest that there was a change around 4500 BC. And from the samples that we have from later dates, we know that North Indians can be modelled as a mixture of populations from Turan and ASI. What we don’t really know is if this possible migration to India meant a split in the PIE language or it stayed as a language continuum due to the continuous contacts. Regardless of the level of divergence that may have existed, it was later erased when Proto-Indo-Iranian because the common language in the 3rd mill.

The first know split, then, should be the one that lead to the Anatolian branch, which as mentioned before must have happened when people from North Iran moved to the South Caucasus ca. 4200 BC. Though the divergence didn’t happen in the South Caucasus, where it stayed close to the core area, but rather when the language went from the South Caucasus to Anatolia somewhere around 4000-3500 BC. It must have been in the southern parts of Anatolia where the language stayed more isolated from the rest and diverged from the other branches.

The next split had to be the one that lead to Tocharian, and for this we’ll have to look a bit closer at the IAMC.

The Inner Asia Mountain Corridor

This corridor at the eastern edge of Central Asia had a native population that was genetically what has been called Ancient North Eurasian (ANE). This genetic profile was also found throughout Siberia in the Paleolithic, and forms part of the Native American populations (admixed with East Asian). In its pure form, it survived from the South Urals to the Altai and through this IAMC well into the Holocene. We have a Mesolithic sample from the site of Tutkaul (Tajikistan) dated to around 6200 BC, a time corresponding with the Hissar Culture which probably started to have contacts with the Neolithic neighbouring regions and eventually led to this population of the IAMC to adopt pastoralism during the 6th Mill (link). We have evidence (though indirect) that this population was moving between Central Asia and China, since they’ve been found to have seeds that originate from both places (see, for example, Frachetti et al. 2014). Some indirect evidence comes too from faunal remains in Inner Mongolia (China), where domestic sheep of the West Asian type has been found and was probably there since the mid 5th mill. (link). In the Altai, we have the earliest evidence from seeds too dating to the end of the 4th mill (link) though it’s probable that they were there since earlier.

What the evidence suggests is that this population adopted an IE language from their southern neighbours (from Turan) at an early date, probably before 4000 BC. It may have been around 3500-3000 BC when part of this population settled in a more permanent way in Xinjiang, what led to the partial isolation of their language which would evolve into Tocharian (while those who stayed along the IAMC would have continued to evolve their language in conjunction with that of Turan, becoming speakers of Proto-Indo-Iranian when it became the language of BMAC). From a genetic point of view, we can look at a few samples that would support this idea.

A sample from the site of Dali (Kazakhstan), part of the IAMC, dated to 2700 BC already shows some admixture from both the southern neighbours of Turan and from the steppe population that arrived to the Altai region ca. 3000 BC, Afanasievo, which shows how these people were moving along that IAMC from the north to the south. However, samples from a later date (c. 2000-1800 BC) from the Tarim Basin in Xinjiang, shows them to be unadmixed, suggesting a larger degree of isolation from before the date of the Dali sample:

Archaeological and genetic evidence provide already good evidence on which to base the idea that Tocharian must have come from this population (the idea that Tocharian may have come from the Afanasievo people lacks both types of evidence, for example) while it also avoids the linguistic problems that were always found in the alternative Afanasievo hypothesis.

The importance of this population when it comes to the spread of IE languages doesn’t end there, since as we’ve seen before they may have been the first ones responsible for introducing the Indo-Iranian language to the steppe people around the eastern Altai region, where their influence is visible in the Fedorovo burial rites.

Back to Europe

We’ve seen so far a probable way in which IE languages must have reached South East Europe, but now we’ll have a look at how they spread to the rest of Europe. The details are still fuzzy and it’s not too important for the purpose of this post. While the Steppe Hypothesis required a more detailed explanation given that not much else other than the languages would have come from the steppe (obviously the discovery that the people of most of Europe came from the steppe gave the theory a perfect basis. It’s just that the time and places of their expansions does not match with those of the IE languages, and that’s now its main problem), when it comes to something going from West Asia to SEE and then spreading to the rest of Europe there’s nothing controversial about it. Basically, everything came from West Asia to SEE Europe and then spread throughout the continent, whether it was farming, any innovations like metal working in its different varieties, writing systems, coinage, civilizations themselves or even Christianity. That’s just the natural way things went in ancient Europe.

We first have to look at the possibilities of how did IE languages spread throughout the Balkans. We’ve provided a credible scenario for Greek, but is that scenario valid for the rest of the Balkans? Let’s look at this problematic question (indeed, the most problematic one). The first solution would be that it was the cultural package from Thrace and Greece was largely responsible for spreading the IE language throughout the Balkans., since outside Bulgaria and Greece (maybe Romania to some extent), there doesn’t seem to be any West Asian ancestry between 2000-1500 BC which would be the time when we’d need IE languages to have spread throughout the Balkans. In any case let’s take the chance to have a closer look at the population dynamics that took place in the Balkans during the Bronze Age which will also show the big difference with Northern and Eastern Europe. We have just enough samples from Bulgaria to show this process:

People from the steppe (Yamnaya Culture) started to move to the Balkans c. 3200 BC and this is how steppe communities from Bulgaria c. 3000-2800 BC looked like:

And here are contemporary local communities from that early period:

As can be seen, there’s a stark difference between them, with the steppe communities having very little admixture from locals, and local communities having very little admixture from the steppe ones (with two outliers at the bottom, from a first and second generation admixture event presumably).

After a few centuries, this is how a steppe community would look like (c. 2400 BC):

And how a local community looked like after a few centuries too (2800-2500 BC):

The steppe community had only one third of its ancestry left, while the local community had some 10% admixture from the steppe. By the time the communities finished admixing this is how they looked like (samples from Early Iron Age, c. 1000-500 BC since we lack from the Late Bronze Age, but they should be about the same):

Once the communities from both sides fused, their paternal and maternal lineages should more or less correspond with the amount of admixture contributed by each community. For example, in the samples above, there are 6 males: 5 of them have local paternal lineages and 1 has a steppe paternal lineage.

In the Western Balkans it seems like steppe communities represented a higher percentage of the population, since we see from 2000 BC and later some 30% steppe in the mixed communities (with paternal and maternal lineages from the steppe also being at around that level). Here are some Late Bronze Age (c. 1200-1100 BC) from Montenegro:

Clearly more steppe admixture and no West Asian admixture.

To reiterate what was said in the first part of this post, the sort of evidence shown here is the one that we lack from Northern and Western Europe. Not because we lack samples (we actually have a lot more) but because at the time the steppe communities started to arrive, the Neolithic ones were mostly gone, and where they still lived it was just long enough for the steppe communities to take females from them before they died out. So not only we lack direct evidence of any of those few communities that survived until the arrival of the steppe, we also can’t show how a fused community between steppe and local would look like after several centuries because such thing never happened. There were no mixed communities. The only ones that existed were the ones from the steppe, with 100% of the paternal lineages being from the steppe.

Back to the problem about the Balkans. We need the Indo-European languages to be all over the Balkans by 1500 BC or shortly after, but we don’t have any clear evidence of how this may have happened. Genetics don’t give us any solution, so only archaeology can help here. The spread of IE languages had to have been mostly a cultural transmission, but I will leave this for people with more expertise in the archaeology of the Balkans and meanwhile offer a possible alternative to this cultural transmission.

We could speculate that the Yamnaya people had already shifted to an IE language from the Caucasus in the period from 3500-3300 BC (i.e, after the CWH people had separated, and probably Afanasievo too). A language transfer across the Caucasus is much more possible during this Maykop phase than anything related to the preceding Darkveti-Meshoko period. And the language transfer would go the natural way, from the more settled, higher culture society to the more mobile, pastoralist one. In this scenario, Yamnaya would have spread the ancestor of Italo-Celtic, Germanic and pre-Balto-Slavic to the Balkans before being replaced in the steppe by the Srubnaya Culture (c. 1800 BC) which would have brought a non-IE language again until the arrival of the Scythians. However, while having evidence of actual people from the steppe moving around the Balkans seems better than no evidence from West Asian admixed populations, it’s still true that they were the minority, lived separated for quite a while from the locals and didn’t have a superior culture that would attract the locals to it. Rather the contrary. So it’s up to each reader to decide if this scenario does really improve things over the first one where cultural transmission would be the basic reason for the adoption of IE languages. Lastly, this alternative scenario, would be incompatible with Hurrians being from the steppe, so if the latter is confirmed it would invalidate this possibility.

From the Balkans to the rest of Europe it won’t be of much help to get ancient DNA because people within Europe were already very similar to each other, and it’s difficult to detect movements of people from genetics unless we have a very high resolution. From what we know about Celtic or Italic, we shouldn’t expect a large amount of people to have been migrating with the languages as they expanded (and very small genetic impact). Following the spread of innovations like iron or war chariots puled by domestic horses may be a better way to track the spread of IE languages throughout the rest of Europe (war chariots may have already played a role in their spread through the Balkans, at least Greece). I will be very brief about this, since the details of it are beyond the scope of this post.

For the Balto-Slavic languages we have some constraints that allow us to know the approximate place and time where they formed, since their formation was strongly influenced by Indo-Iranian language. We know that Indo-Iranian started to be adopted on the steppe at its eastern edge around the Altai region of South Siberia shortly after 2000 BC. These early adopters could be considered Proto-Scythians, and the genesis of the Scythian culture throughout Central Asia would continue till around 1200 BC. At that point, they started to move to the west through the steppe, replacing the preceding Srubnaya Culture (which like the Andronovo Culture was a descendant of the Sintashta Culture, but didn’t go through the language shift that happened in the Andronovo one and therefor would have still spoken its original NEBA language). The Scythians may have arrived to the western edge of the steppe around 1200-1000 BC, which would be the earliest date for starting contacts with IE speakers from the adjacent area in Europe.

The population that would become Proto-Balto-Slavic must have already spoken an IE language by then, but in a older centum form. This would mean that IE had already spread by then to the north of the Balkans, as was said earlier to be required. A good candidate given its time and location for being the culture where Proto-Balto-Slavic formed would be the Chernoles Culture, that started at the end of the 2dn mill. and continued until 500-200 BC. If these were Herodotus’ Scythian ploughmen, as speculated (no reference there by who or why), it would align very well with this possibility, since we should be looking for a population native to Europe and being sedentary farmers, not nomads, but who shared several cultural traits with the Scythians, which would easily explain the influence in their language too.

Notice that dating Proto-Balto-Slavic to around 1000-500 BC and to that approximate area is something necessary due to the clear Indo-Iranian influence that cannot be explained in any other way. After that formation period, we’d have the Baltic branch separating around the latter stage and expanding to the north. The details of this are something I’ve not tried to figure out and it’s not relevant for the purpose of this post. The important thing is that the case of Balto-Slavic formation that can be located and dated with significant accuracy should serve and an example of how IE languages formed and spread to the rest of Europe from the Balkans. Baltic languages (considered to be a very old form of IE) dating to around 500 BC when they started to expand to the north should also help to put into perspective the age of IE languages in Europe.

The details of other language branches should be somehow analogous. Italic and Celtic proto languages (whether one prefers to consider Italo-Celtic a proto language or just some areal features that defined an Italo-Celtic sprachbund area) would have formed in the North Western parts of the Balkans and adjacent areas, with Italic then separating and moving into the Italic peninsula while Celtic expanded to the west from around the eastern part of the Alps, mostly as always proposed.

Germanic is the least clear one, but it should have been a similar process. If we consider that the Chernoles Culture was roughly preceded in the area by the eastern part of the Trzciniec Culture, and that preceding Trzciniec culture had already become IE (somewhere around 1800-1500 BC), then the western part of it would have become IE too and would already be in the right place and time to be ancestral to Proto-Germanic. I’m not specifically proposing that scenario, it’s just for the sake of giving an example.

UPDATE: Checking the available samples, I noticed we have a good sequence from Czechia from the Bronze Age to the Iron Age. Looking at them, I see a significant change between 1600 BC and 1500 BC approx. coinciding with the end of the Únětice Culture and the beginning of the Tumulus Culture:

The samples from the Tumulus culture are dated to 1500-1250 BC, without C14 dating and there are only 4 of them. But the change is persistent through the Late Bronze Age (samples from Knoviz, c. 1100 BC, not shown) and into the Early Iron Age samples from Hallstatt period below:

The Bulgaria EBA samples are a relatively distant source, so the significant 27.5% impact is underestimated. With a more proximate source (like samples from Mokrin, in the Serbian border with Hungary, dating to c. 2000 BC) the impact is around 40%. Quite a big change in a small period between two consecutive cultures.

What’s missing and perspectives

There are many details missing in this brief overview, but I want to point out the ones that are technically missing in order to confirm (or deny) the basics of what I have explained here:

  • Getting samples from North India dating to the early Vedic culture (2000-1500 BC) to confirm (or deny) that they were local people. I’d give this a probability of > 95%.
  • Getting samples from North India dating to the period of 5000-4000 BC to see if there was a big change in the population at that time which could correspond with the arrival of IE speakers to the subcontinent. The probability of this I’ll leave it as “unknown”.
  • Getting samples from early Hurrians (2300-1800 BC) to know if they came from the steppe. This one is not too important for IE questions (except for that alternative possibility of Yamnaya-Catacomb cultures being IE), but if Hurrian could be confirmed to be a steppe (NEBA) language, it would be the key to investigate the whole language family. I’d give this 60-70% chances of being correct.

There are many other samples that we are missing and would help for better knowing all the details, but I’ve listed the most important ones for the purposes of this post. Let’s hope that we don’t have to wait too long to get the answers.

Now I’d like to summarise the languages that may have come from the steppe (those I’ve been referring here as NEBA languages). If the existence of this language family can be confirmed, it would become a very interesting and important subject for the study of European linguistic (pre)history. The fact that linguists can now know with certainty that all of northern and western Europe was repopulated by newcomers from the steppe between 3000-2300 BC, and therefor that all of that area can have one and only one substrate, common to all the area is an amazing step to finally be able to study the substrates of Europe in a scientific and coherent way. Here is a list of the more likely candidates to be part of this proposed NEBA language family:

  • Basque/Aquitanian (> 95% probability).
  • Iberian (> 95% probability).
  • Tartessian (if not a Celtic language, > 95% probability. If Celtic, then it’s Celtic. I could mention Pictish in this same category, though Pictish has much more chances of being a Celtic language).
  • Etruscan (> 60% probability. Further aDNA samples won’t tell us more than what we already know, so it’s essentially a linguistic issue).
  • Hurrian (60-70% as it stands now, but getting the right samples from ancient DNA would confirm it or deny it with almost 100% certainty either way). Urartian would be linked to the outcome of Hurrian.
  • North East Caucasian (~50% chances. It depends a lot on the outcome of Hurrian).
  • North West Caucasian (Unknown probability. It’s strictly a linguistic issue, largely about it being related to NE Caucasian or not).
  • Uralic languages (~50% chances. See the Appendix I for some insights into the matter).
  • Paleo-Sardinian (poorly attested, it’s again a linguistic issue where aDNA has already told us it’s at least possible. Probability around 30%?).

Finally, I’d like to stress that this post is in no way intended to be particularly complete (that would require to write a book with a lot of research) nor a definitive solution to the problems it tries to address. As the title says, this is an alternative view (interpretation) of the evidence we have so far, and my hope is that it can serve as a framework for linguists interested in IE languages or Old European languages to be able to better understand the data that is available and decide to what extent they agree with one view or another, as well as serving as a way to asses future ancient DNA studies with some of the ideas and predictions contained here to see how they fit with either view.

56 thoughts on “Origins and spread of Indo-European languages: an alternative view

  1. -Basques and Iberians 

    I think more people are coming to terms with the idea that BB was not IE. However, the scenario they came up with is the following: a small CWC clan migrates to the Netherlands and picks up a non-IE language from there then explodes all over western Europe. 

    “Indo-European populations started to enter SEE Europe during the period from 2400-2000 BC.”

    In my opinion the only way I see this working is if the Alaca Höyük royal graves were actually from a Maykop related group and were replaced by local non-IEs as was previously believed. However, there’s no consensus amongst archaeologists on that anymore.
    If true then there were 2 separate migrations to Anatolia one southern bringing Proto-Anatolian through Adaniya (Danu?)> Proto-Hittite goes to Cappadocia (Kussara and Kanesh)/ Luwic splits in Konya. I followed a Yakubovich route for Lydian but after reading some recent stuff on Phrygian I’m not sure. The other later migration would cross the Hattian and Kaskian territory and end up being defeated and migrating across northwestern Anatolia through Thrace. There’s so much supporting the first one and not so much the second at the moment. Hopefully, we’ll get some new samples soon.
    As for Greek, Orpheus has put out this idea that the Graeco-Phrygian homeland is NW Anatolia and the supposed Phrygian migration is a Greek historian myth. Funny enough, according to the same legends the house of Atreus itself descends from an Anatolian.

    Also, there’s something pretty interesting: 
    “It is quite probable that the Greek-speaking communities were also present in the Late Bronze Age Troy (Wilusa). However, there are strong doubts that Troy, as well as the whole north-western part of Anatolia (the Troad and Mysia), can be properly defined as ‘Anatolian’ in an ethnolinguistic sense (i.e. speaking one of the languages belonging to the Anatolian branch of the Indo-European languages)”
    https://www.academia.edu/37372307/Anatolian_linguistic_influences_in_Early_Greek_1500_800_BC_Critical_observations_against_sociolinguistic_and_areal_background
    Have you seen Moldova – Zhyvotylivka (I17973) –  J2b2b2~ (J-Z42942) BTW? 

    -Hurrian
    This one is tough. We do have samples from the east Hurrian confederacy of Turrukum and Itabalhum (Urmia basin) and they do show Yamnaya ancestry. The issue is that as with all confederacies there’s no proof they spoke a single language and that language is Yamnaya derived. Though, Hurrians being the “horse people” of the Near East is pretty much confirmed.
    BTW, the Megiddo outliers are frequently thrown around as proof of Mitanni warriors from Androvo even though it’s pretty obvious they harbor Yamnaya related ancestry. 

    “They have been usually considered an Indo-Aryan population (rather than Iranian), but that’s just because at the time they started to move to the west (likely around 1900 BC or slightly later), Proto-Indo-Iranian (PII) was just starting to break up and all the dialects from that time are similar to Sanskrit”

    Well, you know my take on this. I don’t think it’s likely that I-I splits and suddenly in a century or two these Indo-Iranian groups end up with pretty diverse religious beliefs. With *Dyeus based gods for the Iranians and him retiring in Vedas with Indra(+Mitra/Varuna) taking over. One thing is certain though, the Mitanni elites brought Vedic deities and a BMAC cultural package. Luckily for me, no one solved that yet.    

  2. @Vara

    Thanks for chiming in! Alaca Hoyuk was part of the Anatolian Trade Network, and the Royal Tombs are from around the early period when IE would have been moving west from the Caucasus to Europe. But whether they belong to those IE speakers is hard to tell.

    But yes, there would have been two movements of IE speakers from the Caucasus to Anatolia, but the second one would have been more of a movement through Anatolia rather than to Anatolia (except for the NW edge of it, which I would agree was not part of the Anatolian languages territory).

    That sample you mention from Moldova carrying a Caucasus lineage I guess is about the possibility of Yamnaya being IE? It could be, as I suggest in the post, but largely wouldn’t have been mediated by genes (nor some genes would prove the point made). I think the cultural interactions in the Maykop period (with the Mikhaylovka culture and related ones) would be enough to justify a language shift. The Yamnaya people in the Balkans were, however, a minority (some 15% on the eastern parts, 30% on the west) and it’s hard to say that they were the more advanced culture (or that they were the ruling class) that would have made the locals shift to their language. So while the whole thing is possible, I wouldn’t argue too strongly for it unless it becomes something necessary.

    For Hurrians let’s see. We need samples from the most relevant areas, but those have been in war for many years, so who knows when we will be able to get samples. I suppose that many of the people from the steppe that moved into the Near East (especially in the highlands) integrated with other groups and didn’t keep their own ethnic identity. So I don’t expect that having steppe ancestry means being Hurrian, but I do expect (to some extent) that Hurrians did come from the steppe, which would mean that they should largely have the Yamnaya paternal lineages during the first few centuries. This would be easy to verify or falsify, but the samples may not come anytime soon.

    The question about Iranian and Indo-Aryan is something we’ll hopefully be able to discuss when the time comes. I don’t have any strong opinion about it and my comments are very generic.

  3. -PIE

    What does the Kurgan Hypothesis have going for it now as in what are the supposed “IE ethnic indicators” now? The horse nonsense is disproven now even though it was obvious that the Divine Twins were not a part of the core IE belief as I used to argue with Davidski. Nomads imposing their language everywhere they went is so unlikely. A quick glance at the later Central Asian nomads and “Huns” and how often they changed languages and identities should be enough.

    The way I’ve always seen it, actual IE societies were run by warrior elites and priests with highly advanced metallurgy. The last part is why I don’t agree with Heggarty’s model. If PIE was actually a thing it should’ve been spoken in the copper age (5200-4700BCE at the earliest) and not in the neolithic. PIEs being from the northern parts of Iran, where they controlled the Greater Khorasan road, is more likely than the Zagros. I’m not sure if Turan was IE before 3500-3000 BCE, since it’s most likely a dead end as it was conquered by a group from the collapsing Iranian network that later formed the Geoksyur horizon and probably went on destroying some other stuff in the south.

    I also think the Caucasus is not likely to be the PIE homeland. Archaeology describes 3 different traditions/networks intersecting in the Caucasus, the native one ie. the primitive one, the Mesopotamian one and the Iranian highlander one with the advanced metallurgy. The Maykop/Novosvobodnaya elites had clear links to the Greater Khorasan Road. For whatever reason both Maykop and this Iranian network collapsed around the same time. Also, even though KA samples look pretty close to the Maykop ones, KAC is an entirely unrelated phenomena and pretty much started out as one of the native south Caucasian traditions.

    The Zhyvotylivka sample shows that there were straight up Caucasians moving to the western parts of the steppe. But yeah, according to D. Anthony Yamnaya = Mikhailovka + Novosvobodnaya anyways. Novosvobodnaya was lightyears more advanced than anything on the steppe and suddenly Yamnaya also inherited metallurgical traditions which are usually monopolized and can easily go extinct. Yamnaya is pretty much the first steppe culture with advanced metallurgy and special smith burials. I find it unlikely that they picked up all these traditions, from what might have been the most advanced culture in the region if not the world, without language change. As of now the only issue with Yamnaya is the lack of proper settlements and agriculture. Whether or not it Indo-Europeanized the Balkans is a different story though. CWC being LPIE is the latest goalpost shifted nonsense.

    “that would require to write a book with a lot of research”

    Sooner or later someone has to reconstruct PIE culture again. Too bad I don’t have the liberal arts degree for it.

  4. @Vara

    Yes, North Iran has to be the origin of IE, and PIE should not be dated to before 5000 BC. The question is whether Turan could have spoken a different language at that time. If the Neolithic came from North Iran, then it’s likely that they spoke the same language regardless of later movements of people around the area. There’s the constraint of Tocharian too, which should not split after 3000 BC (probably closer to 3500 BC) before I-Ir took over. And then there’s India. We lack data from the area, so for now the best clue we have is that anthropology proposing a West Eurasian migration around 4500 BC. IE could have arrived later, but without more data it’s hard to find a better opportunity. That’s why I include Turan in the PIE homeland at that time.

    For the Caucasus I favour an arrival by the end of the 5th mill., and that would be the first real split that lead to the Anatolian languages.

    As for Yamnaya, I’m undecided. If we could solve the Hurrian problem it may push me one way or another. But the cultural links are there in any case.

    The post is intentionally generic, because getting into too many details would have been difficult both for me and for the target audience. So there’s ample space for debating all of those details.

  5. I’m not sure if Tocharian should be a constraint to any theory seeing as how there are many alternative theories, Hamp’s NWIE, extremely divergent Iranian, and even hoax.

    I think there’s a conquest scenario in India much like Turan. The IVC starts with the massive destruction and abandonment Kot Diji. I’d say this could be the most destructive transition in the history of India. So even if the Vedic Aryans of 1500 BCE look exactly like the earlier chalcolithic population of India such destruction should’ve initiated a language shift.

    I agree with the Caucasus. I think 4200ish BCE is when Indo-Europeans showed up.

    Yeah, I think there’s many potential posts left.

  6. Un saludo Alberto, es un placer volver a conectar contigo, espero que todo vaya bien. Como veo que tienes mi correo electrónico puedes ponerte en contacto conmigo cuando quieras. No he tenido tiempo bastante para leer tu comentario pero lo haré estos días. Me ha parecido entender que estás de acuerdo en que la cultura campaniforme NO hablaba IE?

    Para mi los datos genéticos son claros y R1b-P312 no hablaba IE.
    Has visto los resultados del último papel en el que colabora Kroonen? U106 en la Motilla del Azuer y U152 en el yacimiento de El Argar, además del consabido mar de Df27.

    Por cierto hay un muy buen blog lingüistico que se llama Trifinium dirigido por Joseba Abaitua, tal vez estés interesado en participar en el. Yo he comentado temas genéticos porque no soy linguista pero es muy interesante especialemente los últimos hilos sobre la mano de Irulegui y la evidente relación entre vascuence e ibérico. Son muy buenos linguistas y actúan de manera imparcial (fuera de veleidades nacionalistas).

    En fin el próximo comentario si mantienes en blog, lo haré en inglés, no lo he hecho porque estoy cansado de hacerlo en eurogenes.

    Un saludo

  7. Hola Gaska, gracias por pasarte por aquí y dejar un comentario. Es bueno verte de nuevo y saber que sigues activo.

    Sí, yo sigo manteniendo que la cultura campaniforme no puede ser IE. No hay ninguna evidencia que sugiera eso, y en cambio hay evidencia clara de lo contrario. Según recuerdo, en lo que no estábamos de acuerdo era en su procedencia, ya que yo lo veo como una extensión de la cultura de la cerámica cordada y si no recuerdo mal tú como algo nativo a Europa occidental (aunque hace mucho años de eso, claro, no sé lo que piensas ahora).

    Buscaré ese blog, que seguro que es interesante, aunque la verdad es que he publicado este último artículo para dejar el blog con un cierre más apropiado en el que resumo mi visión del problema. Así que no creo que siga publicando, pero intento sacar algún rato de vez en cuando para ver qué se publica que pueda ser interesante.

    No estoy seguro de haber visto el estudio al que te refieres en el que colabora Kroonen, así que lo buscaré también y añadiré aquí si creo que pueda ser relevante.

    Un saludo!

  8. @Vara

    I wouldn’t necessarily associate destruction with an invasion or a language shift, unless we either know who was who from genetics or we can deduce it from archaeology by seeing a new culture that can be linked to a different area. Collapse is a recurrent theme in societies and more in prehistorical ones, and it’s as often an internal process as it can be due to external inference.

    Burned settlements by the end of the Early Harappan period followed (after a “dark” or “recovery” period) by Mature Harappan Culture doesn’t seem to imply any change in the language by itself. I would need to dive much deeper into the specific case to have an opinion about it, but again that would be going into details that would detract from the main points the post tries to make.

    Re:Tocharian, it’s still kind of controversial in some ways, but I don’t think that anyone still holds that it may be a hoax. And as long as we have a language there with a plausible explanation, I think it’s an important point to take into account.

  9. There isn’t much on the collapse of Kot Diji and frankly there are some really asinine takes and interpretations out there when it comes to IVC in general unfortunately all thanks to the PIE debate. 
    True, destruction doesn’t always imply a conquest or language change. However, looking at the neighbors of IVC around the Helmand we also find destruction and clear signs of Geoksyur influence around 2800BCE. Knowing the relationship between Mundigak/SiS and IVC I’d say it’s pretty likely a related group was responsible but that’s just my interpretation since I’m mostly arguing about details.
    I’m really more interested in samples from 2200 – 1700 BCE which should be the most important for Indo-Aryan. 

  10. Pues es una lástima que lo dejes, al menos contigo tenemos un español que puede modelar muestras antiguas de manera independiente. En todo caso, estaremos en contacto.

    En efecto, yo creo que la cultura campaniforme tiene su origen en la costa atlántica portuguesa, no solamente por las antiguas dataciones que han aportado Cardoso y otros arqueologos portugueses, sino por su singularidad respecto a otras variantes de esta cultura en Europa. Además, mira estos datos de linajes campaniformes ibéricos, moviendose hacia el norte siguiendo el Tajo, para mi son otra prueba de que la cultura campaniforme NO hablaba IE. Nadie les tiene en cuenta y desde luego su origen no es la CWC.

    I6601 (2.700 AC)-Bolores, Torres Vedras, Iberia-I2a1a/1a1-L158>Y3992
    I11592 (2.700 AC)-Hipogeo de Bolores, Torres Vedras-I2a1a/2-Y3104>L161
    I0826 (2.656 AC)-Cerdañola del Vallés-I2a1b/1-L460>M436>M223>Y3259
    I1970 (2.500 AC)-Cueva Verdelha, Lisboa-I2a1b/1b-Y6098>S23680>PF692
    I1976 (2.459 AC)-Dolmen del Sotillo-I2a1b/2a-S2555>S2524>L38
    NEO609 (2.381 AC)-Hipogeo de Sao Paulo2, Almada, Lisboa-I2a1a-CTS595
    I6587 (2.350 AC)-Humanejos, campaniforme-I2a1b/1-M223>Y3259
    I4229 (2.335 AC)-Cueva da Moura, Torres Vedras-I2a1a/1a1a/1-Y3992>L160
    I0460 (2.335 AC)-Dolmen del Arroyal-I2a1b/1b-L460>M223>Y3259>PF692
    I0458 (2.332 AC)-Dolmen del Arroyal-I2a1b/1b-L460>M223>Y3259>PF692
    I2467 (2.315 AC)-Dolmen del Sotillo, campaniforme-I2a1b/1-M223>Y3259
    CDM264 (2.250 AC)-Cueva da Moura, Iberia-I2a1a/2-Y3104>L161
    I6543 (2.212 AC)-Camino de las Yeseras 13a, Area10-I2a1a/2-P37>M423

    R1b-P312 trajo o creó el estilo Ciempozuelos en la península, y desde luego puede ser que existieran movimientos de reflujo tal y como defendió siempre Sangmeister. La única posibilidad de que cambiara su lengua nativa (teoricamente IE para los kurganistas) es que entraran muy pocos individuos y se mezclaran inmediatamente con mujeres ibéricas. Esto también tendría que haber pasado en Turdetania (tartésico), Etruria, Raetia, Aquitania y Occitania donde a la llegada de los romanos se hablaban lenguas NO IE, lo cual me parece muy poco probable.

    Después, absoluta continuidad genética en los marcadores uniparentales (con algunas excepciones gracias a la exogamia) hasta la edad del Hierro. Tenemos 105 genomas masculinos de calidad en todas las culturas ibéricas de la edad del Bronce, 99 son R1b-M269 (94.3 %) , 4 son I2a1a-P37 (3.80%) y 2 G2a2b-P303 (1.90%), asi que Iberia es la clave para entender el asunto lingüistico.

    Y respecto a Yamnaya yo he llegado a la misma conclusión que tú, es decir la cultura Majkop tuvo que cambiar la lengua que se hablaba en las estepas. Lo pienso porque todos los linajes masculinos de la cultura Yamnaya es decir R1b-Z2103, R1b-V1636, R1b-PF7562, R1b-Y13200 & I2a-L699 tienen su origen último en los WHG o si lo prefieres en los cazadores recolectores balcánicos y bálticos. Esa lengua NO-IE de los WHG fue la que continuaron hablando sus descendientes R1b-M269>P312 en europa central y occidental hasta la llegada de los romanos

    En mi opinión, la cultura Yamnaya y sus descendientes indoeuropeizaron los Balcanes, pero creo que la lengua micénica tiene origen Anatolio (el 75% de los marcadores masculinos micénicos entre 1.400 y 1.200 AC son locales o de origen anatolio).

    Y respecto a Italia creo que las lenguas itálicas entraron en la peninsula italiana desde los Balcanes a principios de la edad del Hierro gracias a Z2103, J2b-L283 y R1b-Z118.

    El celta es un asunto de europa central e incluso el norte de los Balcanes y solamente se expandió gracias a la cultura de los campos de urnas a finales de la edad del Bronce.

    10 años después y después de analizar miles de genomas, Harvard sigue sin encontrar R1b-L151 en las estepas y mientras no lo haga la vinculación de este marcador con las lenguas IE es simplemente una quimera.

    Un saludo

  11. Very interesting reading, Alberto. You make some very compelling arguments.

    Have you looked at https://pmc.ncbi.nlm.nih.gov/articles/PMC8059681/, which tested for deeper historical relatedness between various language families?

    The abstract contains “Controversial clusters such as e.g. Altaic and Uralo-Altaic are significantly supported by our test, while other possible macro-groupings, e.g. Indo-Uralic or Basque-(Northeast) Caucasian, prove to be indistinguishable from a randomly generated distribution of language distances.” Their test does have special focus on Northeastearn Caucasian languages: “Other groups more recently and occasionally suggested in the literature [81–88] also test negatively”, listing among these “Basque/NE Caucasian, d = 0.544, p = 0.687”.

    Does their Basque-NE Caucasian conclusion have decisive bearing on Hurrian’s potential relatedness to NEBA?
    And, after reading your article, would it be meaningful to account for the “Vasconising” effect on PIE reconstruction that you spoke of, and shift all the Vasconic elements into a NEBA group (that includes Basque, Iberian languages and any others implicated) and then test larger group against NE Caucasian? The reason I suggest this is because of more accurate results obtained by better constructed trees, as seen in section 4 Results (c) paragraph “Three out of the five groups…” versus (f) starting “At first glance, this result appears”.

    On another note, Figure 1 of the same study is interesting in that it isn’t apparent that there is a peculiarly close relationship between Indo-Iranian and Balto-Slavic within Indo-European.

  12. “When it comes to India, unfortunately the ancient DNA record is almost completely missing. Very few samples (to my knowledge) have been analysed so far and none of them published. But the DNA we have from the surrounding areas already tells us with high confidence how the early Vedic people should look like: Basically just like their predecessors from the Indus Valley Civilization. We don’t have direct samples from the latter either (except one of very low quality that was published years ago), but we have outliers from the surroundings that clearly had an Indian origin (known as Indus Periphery samples). The ones from the Indus Valley itself should look similar but with a significantly higher proportion of the specific Indian signature, usually referred to as Ancient South Indian (ASI or AASI). And indeed, the unpublished samples from the core Vedic area dating to the mid 2nd mill. (late Rigvedic period) are, as far as I know, exactly like that. But we still have to wait for samples to be published in order to be certain about it.”

    There is one recently published sample which conforms to your expectation for the region. The recent https://www.cell.com/current-biology/fulltext/S0960-9822(24)00581-5 includes a Western Tibetan sample (SDLG_o) dated but 1900 years ago, which is an outlier for being the earliest that South Asian ancestry was detected among Western Tibetans. The outlier sample was successfully modelled as a two-way admixture of local ancestry and Indus Periphery admixture of the Shahr_I_Sokhta_BA2 variety:

    “We observed several two-way admixture models for SDLG_o including Shahr_I_Sokhata_BA2 and another Tibetan Plateau population. However, Shahr_I_Sokhata_BA2 harbored elevated proportions of AHG-related ancestry and the remainder from a distinctive mixture of Iranian farmer- and WSHG-related ancestry. To exclude more complex admixture models, we used the D(SDLG_o, Model; X, Mbuti) to evaluate whether SDLG_o carried additional Central Asia, South Asia or Steppe ancestries than the combinations of two sources. … These patterns supported that adding one of these ancestor populations of Shar_I_Sokhta_BA2 as the third source was not necessary for SDLG_o’s modelling.”

  13. @Gaska

    Sí, en la imposibilidad de que los pueblos que portaban R1b-L151 hablaran indo-europeo estamos de acuerdo. Yo lo extiendo también a R1a-M420 ya que las zonas donde hoy se habla germánico o báltico-eslavo también necesitan un sustrato no indo-europeo entre otras razones.

    En otras cosas supongo que diferimos, pero eso siempre es sano para poder debatir. De todas formas, sugeriría que próximos comentarios fueran en inglés para evitar que el resto de lectores tengan que utilizar un traductor.

  14. @ak2014b

    Good to see you around and still keeping up to date with the studies!

    I’ve been reading the paper about language families but it’s difficult to know what to make out of these type of analyses. Sometimes they agree with reality ad sometimes they show really odd things. In any case, both Basque and NE Caucasian are long isolated and drifted modern languages which make them very difficult to analyse. That’s why I mention the importance that Hurrian would have if it turns out to be a language from the steppe, since it would be the closest (by far) that we would have in order to compare all the other possible ones. But we’ll have to wait for good samples to really know.

    Thanks for that other study about Western Tibet. It’s another small piece of evidence that North India had genetic continuity since the IVC. There really is nothing going for the steppe hypothesis when it comes to India and that debate should have been closed years ago. I don’t know why they still couldn’t publish a few relevant samples and put it to rest.

  15. @ak2014b

    I forgot about Indo-Iranian and Balto-Slavic. If anything, that lack of significant closer between both could relate to the fact that they don’t share a common origin, though the results are difficult to trust blindly. The similarities are obvious for any common observer, so the influence is very clear and necessary. And it’s very helpful for us to be able to place Proto-Balto-Slavic and a specific area at a specific time.

  16. “Sometimes they agree with reality ad sometimes they show really odd things.”

    Fair point. Sometimes, computational analyses do get superseded by newer studies that have very different conclusions. It’s hard to know which such studies’ results will hold consistently.

    “If anything, that lack of significant closer between both could relate to the fact that they don’t share a common origin, though the results are difficult to trust blindly. ”

    A disconnect between Balto-Slavic and Indo-Iranian was also apparent from last year’s paper by Heggarty and others https://www.academia.edu/105010777/Language_trees_with_sampled_ancestors_support_a_hybrid_model_for_the_origin_of_Indo_European_languages
    The authors there mention a download link is available from their page at https://iecor.clld.org/. Then for instance refer to Table 1 in their paper, which also interestingly groups Greco-Armenian together.

  17. @ak2014b

    I did know that paper by Heggarty and others. It’s a great effort. But I still see important shortcoming in using those methods. In general, it would be like generating an ancestry tree without being able to take admixture into account. It would work relatively well for Paleolithic Europe, but it will start to give very strange results after the Neolithic. I wrote about this in a short post several years ago:

    https://adnaera.com/2018/10/01/ancient-dna-and-linguistics-an-introduction/

    When I was reading papers about Armenian back then I remember different authors coming to the conclusion that it must have evolved in contact with Greek and Indo-Iranian. I guess that this would favour the scenario in which Yamnaya was Indo-European(ised) and took the other European branches to the Balkans. I’m open to both scenarios that I suggested: whatever turns out to have better support by the data.

  18. Long post, Alberto, with lots of good points, and lots to comment about.

    Let’s start with the strong point:
    You provide a good summary of points, where and why the simple concept of “language shift/ dispersal by genetic takeover” doesn’t work for IE, and the “steppe hypothesis: Indo-Iranian, Tocharian, Greek, also the Balkans. The obvious conclusion is that we must look for other mechanisms, at least in parts of the IE-speaking area. You propose a “lingua franca” concept, with obvious difficulties still here and there (actually across most of Europe).

    Let me first add that a more radical version of the “lingua franca” concept would be that of a colonial language. Latin America is a good example: There, you find comparatively low “European” ancestry, not even speaking of the “Steppe” element in it. Still, it is now predominanty IE-speaking. I don’t need to go into detail – and actually, you seem better positioned to figure out the respective mechanisms, if you haven’t already done so.

    I have elaborated on another issue here: https://adnaera.com/2018/10/18/is-male-driven-genetic-replacement-always-meaning-language-shift/ . Resilience to possible male-effected language shift appears to be particularly strong in matriarchic cultures, which in turn appear to be frequent where there is a strong seafaring tradition. Of course, seafaring males may not return (and often don’t even intend to, cf. settlement of the South Pacific, also to some extent Vikings), so there are good arguments for keep the (on land) property in the female line, and also be open to interacting with foreign seafarers passing by.
    I understand that there is discussion whether the Basque society was originally matriarchic. Without being able to comment on that discussion, I may say that the Basque seafaring (whaling) tradition, apparently going back to at least Roman times, should have been supportive of a matriarchic system. Also, El Argar seems to have had at least strong matriarchic tendencies, as demonstrated in their burials. Which means that, when anyway questioning an automatism between (male) aDNA shift and language shift, there is also good reason to question the impact that “Steppe”/ yDna R1b introgression may have had on coastal Iberian communities. Incoming foreign merchants may have left their aDNA, and also their merchandise including amber etc., but that may have been it – the mother tongue could still have prevailed, unaffected.

    t.b.c

  19. @ ak2014b: Your linguistic link (excellent read) brings me to a couple of other points, that are partly axiomatic in the sense that they guide my approach to the whole language family / IE (and also Hurrian) issue:

    1. Language families are defined across (at least) 3 dimensions, namely (i) lexicon/ vocabulary, (ii) morphology/ grammar, and (iii) phonology. The traditional focus of historical linguists has been the lexicon, via Swadesh lists etc. Your link addresses the morphology, with findings that are sometimes at odds with traditional, lexicon-based phylogenies. There may also be attempts to look more closely at phonology, however, I am not aware of any recent study in this respect.

    All three dimensions may change over time: Vocabulary may borrowed from neighbouring families (“Sprachbund” etc.), or travel as “Wanderwort” around the globe (c.f. Lat “canis” vs. Nahuatl “quintl”, both meaning “dog”). “Sprachbund” may also affect morphology, as is well documented a/o in India, where the grammar of some IE and Dravidian languages has been converging, and a more recent example would be “Spanglish”. Most resilient appears to be the phonology, especially when it comes to acquiring “foreign” sounds, which is why sound shifts are typically taken as indication of language shift, i.e. population A adopting the language of population B, in the process transforming “unpronouncable” sounds into their nearest approximation within the old language. However, we also have “natural” sound shifts, as e.g. visible in the Satemisation of modern French: Just say “cent” to see French isn’t a Centum language anymore. The example of French, b.t.w., makes the whole Satem-Centum stuff pretty useless when it comes to historic linguistics.

    2. With all three dimensions being somewhat fluid over time, it is hard to pin-point a proto-language to even all three of them, even more to just one. Nevertheless, I postulate that PIE is technically a hybrid language:

    a.) The lexicon is strongly influenced by Uralic, hence the Indo-Uralic hypothesis. Actually, I found even more lexical parallels to S. Nikolaew’s proposed Nivkh-Algic-Wakashan family (intriguing a/o PIE *wik, as in Latin “Vicus”, Indic “wikipotis” [mayor], Germ. -wik, -wich, -weig settlement names [Narvik, Norwich, Bunswig etc.] – vs. the Wig Wam). So the PIE lexicon appears to have substantial East Eurasian influence.
    https://www.academia.edu/28569450/S_L_Nikolaev_2016_Toward_the_reconstruction_of_Proto_Algonquian_Wakashan_Part_2_Algonquian_Wakashan_sound_correspondences

    b.) Morphologically, the closest neighbour to PIE seems to be Semitic (presence of grammatical gender, synthetic, consonant-based roots). My assessment is confirmed by the paper linked by ak, which has IE and Semitic pretty close, at d=0.398 (and much closer than IE and Uralic):

    c.) As concerns phonetics, PIE is characterised by a very high number of consonants, including all the kw, bh etc. stuff. All in all, PIE is believed to have 25 consonants in its inventory. The only living languages I am aware of which get to this or higher numbers are Semitic languages (28 consonants in Arabic), Georgian (28 consonants), and of course NE Caucasian languages, with up to 70 consonants – albeit Basque, with 24 consonants, isn’t bad in this respect. Finnish, OTOH, just has 13 consonants in its inventory.

    Therefore, I postulate that pre-PIE has morphologically/ phonetically its roots somewhere not too far from the Caucasus and Semitic-speaking areas. To become PIE, it however required lexical overforming by some East Eurasian language. I intend to revisit the question of “where and when” later. For the time being, let me say that “Steppe ancestry”, in its almost 50:50 mix of CHG and EHG genes, aligns well with that postulated linguistic hybridization.

    t.b.c.

  20. 3. A key question to be answered for me is: How can language families differentiate so strongly from each other, when language contact provides strong incentives/mechanisms for convergence, at least into a “Sprachbund”. The obvious answer is: There wasn’t such language contact for a long time, so the families evolved in isolation, thereby acquiring their distinct features.

    The reason for such isolation over millenia was of course at least across most of Eurasia the LGM. Hence, the look for homelands of any Eurasion proto-language, not just PIE, needs to start with a look at glacial refugia. I have started applying that approach here: https://adnaera.com/2018/12/10/how-did-chg-get-into-steppe_emba-part-1-lgm-to-early-holocene/. And invite you to revisit the maps of West Eurasian glacial refugia posted there. What becomes clear:

    a.) The NE Caucasus was ininhatible then, therefore not qualifying as homeland for any language family, be it PIE or Hurro-Urartian. And actually, there is hardly any archeological evidence of human activity there before the early neolithic.

    b.) I already made my point in the linked post that both Colchis (West Georgia and the Eastern Turkish Black Sea coast) and the Southern Caspian were in principle inhabitable by Humans (and might as such each have nurtured a different language family);

    c.) Asides from the Black Sea coast, most of Anatolia should have been uninhabitable during the LGM. Refugia, however, existed (i) in the Aegean including the Bosporus area (which didn’t exist then), (ii) on the Gulf of Iskenderun, (iii) in the Northern Zagros / Lake Urmia area (albeit that one appears to be inhabitated by Neandertalers, not AMHs), and, of course, in the Jordan Valley (Natufians).

    4. When it comes to placing pre-PIE, and pre-Hurro-Urartian, we need to consider a couple more languages/ language families that also emerge from that area, namely (i) Semitic, (ii) Kartvelian, (iii) Hattic, and (iv) Sumerian. Semitic may be placed into the Jordan valley, and Sumerian may have originated in the Persian Gulf Oasis that was flooded sometimes around the 6th mBC. This leaves us with four languages/ families, and four refugia. Two of them, namely Kartvelian and Hattic, are only reported to have been present on/ near the shores of the Black sea, so I assume their homeland there. Which leaves us with the South Caspian, and the Gulf of Iskenderun, for either pre-PIE or pre-Hurro-Urartian.
    My guess is for pre-PIE on the South Caspian – and actually also yours, Alberto, if I understand your argumentation on the eastward spread of IE languages correctly. This would place the pre-Hurro-Urartian homeland on the Gulf of Iskenderun.

    t.b.c.

  21. Now, let’s move to ANFs. Obviously, they must have weathered the LGM on the Gulf of Iskenderun, before a more favourable climate allowed them to move inland and settle the northern part of the Fertile Crescent. And, as has become clear above, I postulate they were speaking some kind of pre-Hurrian. The earliest historical attestation of Hurrians is from the northern Fertile Crescent, so we might see historical continuity there. Which is plausible, because agriculture gave ANFs a demographic advantage, ultimately allowing for settlement of much of Europe.

    The route from there to East Caucasia is clear. Actually, there is lot of aDNA evidence for ongoing genetic exchange between ANF and CHG, and the link via the Obsidian Route is well documented archeologically. Introduction of agriculture into SE Caucasia (E. Georgia and Azerbaijan) seems to have been accompanied by a demographic shift, i.e. immigration of (pre-Hurrian-speaking) ANF. And once NE Caucasia became inhabitable again (the reasons for the pre-Neolithic hiatus are still unclear, they may have to do with volcanic activity of Mt. Elbrus), they would have been a prime candidate to move in, and mixing there with Yamnaya-like population incoming from the North.

    And then, of course, we have the maritime, “island hopping” colonisation of the Mediterranean. Which started from the Gulf of Iskenderun or nearby, and should also have been essentially speaking pre-Hurrian. That would make Iberia speaking some kind of pre-Hurrian, possibly enriched by WHG substrate, before the arrival of IE. And the same applies to Sardinia (Paleo-Sardinian) and Italy (Etruscan). The latter is an obvious example of Steppe ancestry nor effecting language change – the same may apply for Iberian/ Basque.

    When it then comes to linguistic proximity, or lack thereof, between NE Caucasian and Basque, we need to consider that both languages should have separated some 8.000 years ago, and absorbed quite different influences in the meantime.

    Btw, on your NEBA proposal: We know how a Hurro-Urartian language that has been overformed by IE during the Iron Age looks/ sounds – namely like Armenian. If your NEBA hypothesis is true, shouldn’t Armenian be quite close to Western IE languages? Actually, from all I have read, it isn’t.

    t.b.c.

  22. Let me stay a bit with the ANF issue: We have another language that in all likelyhood emerged from Anatolia, namely Hattic. Hattic is poorly documented, but sufficiently enough to make clear that it isn’t morphologically related to Hurro-Urartian. While the latter was exclusively suffixing, Hattic had a strong pre-fixing element, e.g. marking the plural with the prefix “fa-” (“ashaf”= god, “fashaf” = gods). Prefixing is a prominent feature of NW Caucasian languages, and to some extent also found in Kartvelian, which has lead some linguistics proposing a relation of Hattic to both. This makes also aDNA-wise sense, as long as we suppose that Hattians were also gene-wise primarily ANFs. In that case, the ANF element in the Meshoko samples would have come from (pre-)Hattic immigration along the Black Sea coast.

    Since the southern Black Sea coast was inhabitable during the LGM, a (pre-) Hattic homeland there is generally plausible – and it might theoretically have extended towards the Bosporus area. The problem, of course, would then be having two distinct languages, from different LGM refugia, with a widely identical genetic signature, namely ANF. Having said that: There is indication for a substantial pre-fixing element in the pre-IE substrate, as outlined by Shryver for continental Celtic. And the replacement of gender suffixes by articles in modern Germanic and Romance languages may be interpreted as a shift from suffixing to prefixing. For Spanish and Italian, it may of course also be interpreted as Arabic influence, but that explanation fails when it comes to Germanic languages.

    In short: The thought I am entertaining is whether there might have been two ENF languages – one related to Hurrian, distributed along the “island-hopping”, mediterranean route, and the second one related to Hattic, which expanded overland through the Balkans and ultimately forming the Linear Pottery Culture.

    This takes me to the issue of non-IE substrates in Western IE – something your NEBA proposal fails to explain sufficiently (especially since Vennermans “Vasconic” theory seems to have been completely de-bunked by now). Before I continue that point, however, I need to make a couple of other notes.

    t.b.c.

  23. You write: “Northern and Western Europe were completely (re)populated by people who came from the steppe.”

    Actually, this is unproven. Even for Britain, where aDNA evidence seems to suggest it, the Reich Lab concedes insufficient coverage. They point to a lack of sampling for a/o NE Scotland and East Anglia (the latter being Britains most densely populated part during the Medieval, prior to the Black Death), and Wales also seems hardly covered.
    https://reich.hms.harvard.edu/sites/reich.hms.harvard.edu/files/inline-files/2021_Armit_Reich_Beaker_Antiquity.pdf

    And, of course, your thesis would implicate yDNA I2 virtually dying out across Northern and Western Europe. Well, it hasn’t. And dominated a/o in the Lichtenstein Cave, an elite burial (whole oxens were buried there alongside the humans) from the Urnfield Culture.

    You then write: “We don’t have a single sample in the ancient DNA record from the Neolithic communities from the periods just before, during or after the arrival of the steppe communities.”

    In fact, we have lots of them. Here is France (2 times, Paris Basin & Languedoc):
    https://www.cell.com/current-biology/fulltext/S0960-9822(20)31835-2?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0960982220318352%3Fshowall%3Dtrue#mmc2

    Then Wallonia (South Belgium): https://eprints.hud.ac.uk/id/eprint/35254/1/FINAL%20THESIS%20-%20FICHERA.pdf
    Note also, that the Spiennes flint mine has been C14-dated to have been in Operation until 2.200 BC, with no archeological evidence of BB/ Single Grave presence, so it is assumed to represent continuity of the previous SOM culture. Some archeological background is here:
    https://biblio.naturalsciences.be/associated_publications/anthropologica-prehistorica/anthropologica-et-praehistorica/ap-112/ap112_77-89.pdf

    Over to their post-Michelsberg cousins of the Wartberg Culture in east Westphalia/ North Hesse:
    https://www.nature.com/articles/s42003-024-06676-7#MOESM4
    Dated to after, or just before SGC arrival to the north, typical Wartberg aDNA (i.e. strong WHG element, albeit lower than in Belgium, and intriguingly pointing to Korös/HU as nearest WHG sample, while in Belgium it is Loschbaur/ Bichon). Shows btw also that the Wartberg people weren’t completely immune to the Plague, but their decimation, if it ever took place, wasn’t caused by it.
    Older, but still valid read on the Wartberg/ Single Grave relation, and their co-existence for at least some 300 years:
    https://www.jungsteinsite.uni-kiel.de/pdf/2002_2_fabian.pdf

    Up north, NE Jutland, with some “shortly after” aDNA, showing in the PCA the admixture just where you would expect it, namely at the interface of CW and Pitted Ware. The history of 3-4 centuries coexistence between SGC (Western Jutland), late TRB (SE Jutland and Danish Isles) and PWC (NE Jutland) is briefly addressed in the paper, more on that can be found via the references.
    https://pmc.ncbi.nlm.nih.gov/articles/PMC7808695/

    Lastly, let me mention the Schönfeld Culture. It was once believed to be just a local phenomon, but recent excavations have extended its primary area to approximately the triangle between Luneburg, Berlin and Goslar, with outposts from the mouth of the Elbe down to Lower Austria. They apparently not only withstood CW/BB influence, but survived both, until by 2,200 BC becoming part of Unetice. Unfortunately cremating, so no aDNA.
    Here is a settlement map (which doesn’t seem to fully capture recent excavations around Berlin):https://st.museum-digital.de/object/36935
    A reconstructed house, at its original find spot, may be seen here (scroll down, third picture): https://www.steinzeitdorf-randau.de/

    IIRC, that NW Switzerland study from a few years ago had also some pre-BB, with-BB and “contemporary, but w/o BB ancestry” aDNA.

    In short: There is lots of areas in Central and Western Europe that weren’t depopulated. In some of them, mixing with the CWH took place. Elsewhere, e.g. parts of Wallonia and the Schönfeld Culture, pre-CWH-Cultures continued to exist until the onset of the Bronze Age.

  24. There is more to comment, e.g. on the issue of pre-IE linguistic substrate. Your Matasović link is a great, and quite telling link, because he shows the proto-Slavic substrate to mostly consist of typical HG terms, including hazel(nut) (prime HG staple food in Northern Europe), nettle (main textile fibre), and a couple of medical herbs. No animal husbandry terms, and just a bit of farming (barley, already main crop of EEF). In short: Just what you would expect from a population that has a quite high WHG share (as SOM or Wartberg), but very much at odds with representing a NEBA substrate.
    Moreover, unlike the Leiden School for Germanic, Matasovic only identified sparse common structural patterns in the Balto-Slavic substrate, which seems to indicate that it originated from several different sources – your NEBA hypothesis would instead call for just one, pretty homogeneous source.

    So, here is my “big picture”:

    1. I am with you when locating the pre-PIE homeland in the South Caspian. And much of the stuff you write about the eastward espansion (India, Tocharian) sounds good to me.

    2. That would, however, still have been pre-IE, i.e. before being lexically overformed by some EHG language. Nevertheless, sharing broadly the same morphology and phonology would make adaptation of PIE, which just differed in some lexical aspects, not too hard.

    3. The most likely place where EHG genetic and linguistic influence was absorbed by pre-IE speakers, essentially genetically CHG, was the Prikaspiiskaya culture on the Lower Volga. [I had actually intended to explore this in depth in my part 3 of the “How did CHG into the Steppe” series, but lost an already 50% finished article to a hard disk crash, and somehow never restarted writing]. Some reading on the Prikaspiiskaya culture is in the link – note especially how the author connects it to the timewise later Khvalynsk Culture. Time, place, and archeological features fit, also for Yamnaya, and the northern Caucasus foothills.
    https://journals.uni-lj.si/DocumentaPraehistorica/article/view/43.7/6975

    4. The remainder for Western Europe is the traditional CWC story, i.e. CWC as vector for the spread of IE. With the caveat that absorbing Steppe/ CWC-mediated genes does not neccessarily have meant a language switch to IE. That, OTOH, would make it easier to explain Indo-Europeanisation of the Balkans, Italy etc. IE was already present in some areas, so non-IE-speaking communities had exposure, which facilitated a language switch when further “pushs” travelled up the Danube (or through the Mediterranean). These impulses, btw, do not neccessarily need to have been demographic. Bronze-age trade with Greece (British tin and Baltic amber shipped down the Danube) might have sufficed, especially with two IE-speaking cultures, namely Unetice and Mykene, running much of the show.

    5. What still remains unclear to me is the path of IE towards Anatolia. My guess is on the Kura-Araxes Culture, but I’ll need to do more reading. Including giving your respective sections another look, or two. I may come back to them later.

    That’s it with my comments for the time being. Long post of yours, certainly inspiring, lots of comments.

  25. @FrankN

    Good to see you after so long. Quite some long comments from you too, so I’ll need more time to check all the references and give more detailed answers, but a few comments for now:

    About the initial formation of language families I basically agree with the idea that they formed during the Paleolithic (and the LGM would have selected the few surviving ones while giving them higher coherency). This dynamic of languages diverging during the Paleolithic but converging from the Neolithic onwards is something I already wrote about as mentioned in the comment just above yours.

    Whether the pre-Hurro-Urartian family would have been the one spoken by ANF is something I wouldn’t disagree in principle, but I’ll wait for further clarification of whether Hurrians came from the steppe. We have some clues (especially from genetics) that are compelling enough for me to want to wait for relevant samples to confirm or deny.

    Re: what you say about the substrates in Northern Europe (referring to the Matasović paper I quoted) I would say that the bias towards mostly pre-Neolithic words is expected given the phenomenon I have explained in the post: It’s been assumed (just because it was the easiest option at the time) that everything that couldn’t be linked to non-IE must be IE. That’s how you can get 50% of that “unknown” IE substrate in Catalonia (+10% Greek, +23% Latin) while only 10% is Iberian. This is an issue that must be addressed before we can actually evaluate the substrates of Europe with much better accuracy.

    I think that my bigger disagreement with you are still about the CWC-BBC problem. I couldn’t go still through all the references, but from first impressions:

    – The paper about France: Most samples are Neolithic. There are just five individuals that are from the relevant period, two of them (TORTC and GBVPL) are dated to 2527 and 2525 calBC (average), and they are regular Neolithic farmers. One (GBVPK) dated to 2387 calBC and this one is already a typical BBC male with the typical R1b Y Chromosome. Then there are two more from after 2000 BC which are also typical BBC but females. So these samples are exactly as all the others we have.

    – The paper about Belgium: The samples themselves show the same pattern with Neolithic type ones having i2a lineage while BBC type ones having R1b lineage. I didn’t find exact dates for each sample, but nothing stands out about them.

    – The paper about Warburg: The samples are clearly Neolithic, dated to 3300-2900 BC. So nothing special about them.

    – From the one about Denmark: “Our genetic data document a female (Gjerrild 1) and two males (Gjerrild 5 + 8), harbouring typical Neolithic K2a and HV0 mtDNA haplogroups, but also a rare basal variant of the R1b1 Y-chromosomal haplogroup. Genome-wide analyses demonstrate that these people had a significant Yamnaya-derived (i.e. steppe) ancestry component and a close genetic resemblance to the Corded Ware (and related) groups that were present in large parts of Northern and Central Europe at the time. Assuming that the Gjerrild skeletons are genetically representative of the population of the SGC in broader terms, the transition from the local Neolithic Funnel Beaker Culture (TRB) to SGC is not characterized by demographic continuity.”

    Now, if we get enough resolution we will find a Neolithic community contemporary with a steppe community. This is necessary to have happened, since otherwise the steppe communities could not have gotten EEF admixture. But those Neolithic communities died out early after. We may even find a surviving Neolithic community that post-dates the arrival of steppe ones, if it was in an isolated enough place. But nothing of this has any significant relevance from a linguistic point of view.

    Though clearly the area around present day south Poland, Slovakia, Czechia and Southern Germany was where there was the bigger interaction between both communities. After all, that’s where the steppe communities got 50% of EEF admixture. So any linguistic consequence of those interactions (if it exists) had to be in that area. Not in Western Europe, since in Northern France and Great Britain they got 0% admixture and in Southern France some 5-10%, while in Iberia the remaining 10-15%. Sure, that makes Iberia the place with the most EEF admixture, but that’s just because of the cumulative nature of it (I mention this because the “mainstream” view just did some hand-waving saying that non-IE languages may have survived in Iberia due to the higher EEF admixture).

    Regarding Iberia itself, I don’t think that a sea colonization is very plausible. But even if many came by sea, it was not a male migration. These were steppe communities, with men and women. And incorporating a small amount of local females into those communities wouldn’t make the community as a whole shift their language. Especially if all the other communities were also from the steppe and spoke the native steppe language.

    If instead you argued for a language shift in Central-Eastern Europe, that would be slightly more plausible (though really very, very strange). But in that case the end result would be the same, CWC-BBC being non-IE.

  26. Hi, Alberto,

    I agree that my links posted for the Central European EN (counting France as CE) could do with clearer data. Specifically:

    – My impression for the France data is that they have found some unadmixted samples from the BB/CWC period. But unfortunately (or not), all their male samples from that period have come out as R1b.

    – Belgium has only three dated samples (apparently from other sources, the author/ Reich Lab missed out on C14 analyses). One is Iron Age, e.g., should be discarded completely. Of the other two, one is pre-BB (and unadmixed), the other one BB (admixed, and yDNA R1b). The author apparently hoped to be able to address aging via the stratigraphy. However, the C14 dates have shown the stratigraphy to be disturbed (BB/IA samples from lower layers than the SOM sample). We need to hope that someone (Reich Lab?) will do C14 analyses also for the other samples.

    – Warburg: The second sample is some 200 years younger than the first one, and should fall into the period when SGC was already present in the North (and after CWC had southerly bypassed Warburg on their way from Thuringia to the Rhine-Main area, and ultimately to N. Switzerland). Unfortunately, wiggles in the calibration curve don’t allow for precise dating during the first half of the 3rd mBC.

    – Denmark: Yes, the data shows a shift. But also (check out their PCMs) signs of something like a 50:50 admix of Pitted Ware and Single Grave DNA.

    My point was mostly about your statement that we don’t have samples from directly before and after appearance of CWH.

    A more fundamental point is: If your theory of an essentially linguistically homogenous NEBA sphere across most of Northern and Central Europe from plusminus 2.500 BC is correct, and it would have been IE-ized just some thousand years later, we should have ended up with something similar to Armenian. Or at least one homogeneous form of Western IE. I find it very hard to explain the split between Germanic and Celtic from your theory. And even when allowing for substantial Indo-Iranic (Scythian) influence on Balto-Slavic, the latter should be much closer to Germanic and Celtic than it actually is. Your NEBA theory anyway fails f.t.b. to explain the emergence of Germanic and Celtic.
    If, OTOH, both go back to CWH overforming previous MN languages, thereby absorbing a strong substrate from them, the emergence of distinct sub-families within Western IE should be expected:

    – Pre-Germanic would have absorbed substantial Post-Michelsberg substrate. This, in turn, should have been a mixed language based on the EN/ early MN farmer language spoken in the Paris Basin (and possibly Britanny), i.e. substantially influenced by Cardial Pottery EEF language, with strong influence from the WHG languages absorbed during the late MN. The latter would have provided the HG substrate that can still be identified. And, as Michelsberg apparently absorbed different kinds of HGs – more Loschbaur-like west of the Rhine, more Köros-like, partly also Baltic-HG-like east of the Rhine – we should expect diversity, i.e. absence of common morphological features, in that substrate.
    Intriguing in this respect is also the current diversity of Danish, with four clearly distinguishable dialects, which in their geographic distribution mirror the cultural split during the early EN between CWH, late TRB and Pitted Ware. Given the very small geographic distance between these dialects, I find it hard to explain them as a recent phenomenon. Having emerged as IE absorbing very different linguistic substrates (or, in the case of West Jutland, no substrate at all), however, makes sense to me.

    – (Pre-)Celtic would reflect IE with mostly “Danubian” EEF substrate absorbed. If my theory of that substrate being strongly derived from Hattic is correct, it would explain P. Shryjver’s observation of Continental Celtic (Gaulish, but not Celtiberian) being characterized by what he calls a “verbal complex”, with a total of 10 possible grammatical “slots” being arranged around the verb. And, actually, modern French with its particular “Il n’y en á eu pas du tout” constructions seems to set forth such “verbal complexes” until today – they certainly can’t be explained from Romanization or Germanization.

    – Balto-Slavic is complex, as you have already written, because it has certainly received strong East Iranic (Scythian) influence. [Btw., I think that West Germanic (but not North Germanic!) also has received such influence, in that case from the Alans (Ossetians) after the fall of the Roman Empire.] Let me add in this respect that some linguists have started to question the theory of a Balto-Slavic family, ascribing similarities instead to “Sprachbund” phenomena over the last good two millennia.
    Unlike Slavic, Baltic seems to be related to Dacian (or Daco-Thracian) – difficult to prove or disprove, since Daco-Thracian is only poorly attested. For what it is worth, Dacian town names ending on “-dava” are well attested, and can actually be found as far north as Central Poland (e.g. Wlodawa on the border to Belarus, some 80 km ENE from Lublin). Dacian cultural influence, possibly including migration, on SE Poland during the late IA is archaeologically well attested. Moreover, Vennermann and Shryjver have postulated that Verner’s Law, a key component in the shift from pre- to proto-Germanic, has arisen in language contact to speakers of Finnic, and more specifically reflects how originally Finnic speakers would conduct language switch to Germanic. If so, Verner’s law, and ultimately the shift from Pre- to Proto-Germanic, must have arisen in a contact zone between both families, which only can have been the Baltics. This precludes presence of Baltic languages at their current location during the late IA – they would instead only have arrived their during the Migration Period, i.e. pushed north by Huns and Goths.
    https://en.wikipedia.org/wiki/Verner%27s_law

    As to Slavic: Proto-Slavic (but much less so Baltic!) has quite some Germanic adstrate, e.g. “hleb” (bread), borrowed from Gothic “hlaifs” (loaf [of bread]). The Slavic conservation of the initial “h”, which all Germanic languages except for Icelandic have lost in the meantime, attests that the borrowing is ancient. When looking for the Proto-Slavic homeland, asides from Scythian, contact to Gothic should thus also be considered. So, Proto-Slavic most likely arose within or immediately next to the Chernyakow Culture. Last but not least, I see some Italic, more specifically Venetic influence on at least parts of Slavic, namely the “g”->”h” sound shift in modern Czech, Slovakian, Ukrainian and Belorussian, and before them in Old Ruthenian.

    Otherwise, https://www.quora.com/profile/Thomas-Wier provides for interesting reading. Wier is an American linguist currently lecturing in Tbilissi, and has been diving a bit into the relation of North Caucasian languages to other families, including Hurrian, Hattic and Basque (scroll through his posts). One of his key arguments against a genetic relationship is the abundance of consonants in North Caucasian, which is neither found in Hurrian nor in Basque. I personally think that such “unspeakable” consonants are the first thing that gets lost when non-native speakers switch language – but may be preserved when speakers of a language having them switch to another language. Point in case are the “click-consonants” in SW Africa, typical for Khoisan languages, but also present in some neighboring Bantu-languages including Zulu and Xhosa.
    This indicates to me that NW and E Caucasian, for all their fundamental morphological difference, share a common substrate (phonetic, possibly also lexical) – in all likelihood from CHG. In terms of morphology, however, they have absorbed very different influences – NW Caucasian rather from Hattic, NE Caucasian more from Hurrian (with maybe, as Ceolin e.a. suggest, also some Dravidian influence present – Maykop aDNA might deserve a closer look in that respect). This doesn’t necessarily preclude NW Caucasian being ancestral to Basque – “unspeakable” consonants are likely to get lost during language change. But in your NEBA horizon, especially if it emerged as radically as you suggest, I would expect more traces of NW Caucasian phonology than are identifiable (which is zero). From a phonological point of view, an ANF-Hurrian relation to Basque seems easier to defend than a direct relation to East Caucasian.

  27. @FrankN

    Certainly very interesting thought regarding linguistics there. But I do find that certain constraints and problems of time depth make the specific links complicated.

    To clarify a bit better my own view: Language differentiation between the different IE language families from Europe would have started in the Balkans itself, which partially addresses your concerns about how would they have differentiated under an homogeneous substrate. Then there’s the fact that there would easily be 1000 years (closer to 2000 in Western Europe) between the arrival of the CWH and the arrival of IE languages.

    That’s regarding the possible effect of substrates, which is a complicated linguistic issue where many opinions can be found (as an example, I’ll quote the opening sentence of Kortland’s “An outline of Proto-Indo-European” (https://www.academia.edu/29613427/An_outline_of_Proto_Indo_European): “Indo-European is a branch of Indo-Uralic which was radically transformed under the influence of a North Caucasian substratum when its speakers moved from the area north of the Caspian Sea to the area north of the Black Sea”).

    Then we have the problem of the shallowness of the IE language families from Europe. From Chang, Garret et al. 2015, they get something like this:

    https://cdn.sci.news/images/enlarge/image_2516_2e-Indo-European-Languages.jpg

    Or if you look at the newer revised version by Heggarty et al. linked above by ak2014b which has pushed back the dates to fit a different hypothesis, it’s still not that much different. We can’t link the expansion of these families to the CWH. And we know at least something about the case of Celtic, being a very late arrival to Western Europe. For Balto-Slavic we have the constraint of the Indo-Iranian influence from the Scythians. (BTW, there’s a recent paper about Germanic proposing an expansion initially from the East Baltic to Sweden: https://www.biorxiv.org/content/10.1101/2024.03.13.584607v1 ).

    But finally, we have the biggest problem which is the actual languages found in the areas where the CWH expanded, basically in Iberia and to some degree in Italy. As I said above, Iberia is really not the right place to look for an in situ language shift from the CWH people (or BBC people if you prefer). Gaska posted the statistics above from the male lineages we have from the Bronze Age too.

    I forgot to answer from your earlier posts to the possibility of an early matriarchal Basque society. First, I’m not aware of any particular difference in the BBC from 2400 BC and after in the area of Aquitaine and northern Spain relative to other places. But I’d point out that the Megalithic cultures of the Neolithic were extremely patriarchal too (or patrilocal, if you prefer) as we’ve seen from Dolmen burials. More important to me is the more simple fact that this was not any sort of male driven expansion, but a migration of communities into mostly depopulated areas, which doesn’t leave much room for debating what didn’t happen.

    BTW, from Ceolin et al. linked above by ak2014b too, here’s the full chart with language distances: https://github.com/AndreaCeolin/Boundaries/blob/main/Supplementary_Information/FigS1/FigS1.png . A few strange things, like NE Caucasian and Dravidian getting along surprisingly well with too many other families, which probably explains why when measured against each other they get such a low distance in spite of the almost impossible real connection.

  28. An interesting paper I’ve come across these days relevant to the possible NE Caucasian (and Hurro-Urartian) connection with, in this case, Etruscan:

    “Etruscan’s genealogical linguistic relationship with Nakh-Daghestanian:
    a preliminary evaluation” by Ed Robertson: https://www.theelen.info/%5B20151101%5D%20Etruscan%20numerals%20-2-.pdf

    Yes, there are so many theories that anything can be found to support any theory or the opposite one. I just found this to be a more serious effort compared to some other ones and worth a look for those interested in the subject.

  29. @Vara

    To be honest, I haven’t read about it in any detail. It’s really difficult to know what to make of any of these sort of complicated linguistic theories about isolated languages where any comparison is based on very few details.

    In general, pretty much any other West Eurasian language, and even some (North-)East Eurasian ones, has been found to have resemblances with Indo-European. I guess that’s because IE has such a large corpus that it’s likely to always find correspondences if you look for them. Many languages (including Burushaki, but also Etruscan or Basque) have been proposed to be para-IE (belonging to a sister branch of IE), just like the whole Uralic family has, or even Chukotko-Kamchaktan (which we know is impossible). The corollary of all these hypothesis would be the Nostratic language macro-family.

    Burushaki as an IE language was also proposed by Ilija Čašule more recently, and John Bengtson, who is clearly used to examine these sort of very low evidence connections reviewed the findings and didn’t find them very convincing (https://www.degruyter.com/document/doi/10.31826/jlr-2011-060108/html ). (BTW, for anyone wanting to check out Bengtson’s summary about the Basque-Caucasian hypothesis, here’s a link: https://www.academia.edu/31720885/Euskaro_Caucasian_Hypothesis_Current_model_2017_ . You’ll probably find the evidence to be weak, but it’s up to anyone’s criteria if it reaches an expected threshold or not).

    One of the main purposes of this post was to explain the data that we have from ancient DNA in a way that can make sense for historians and linguists, because only knowing what is possible, impossible, likely or unlikely is how they are going to make some real progress instead of looking at random connections. Many linguists have started to take into account the genetic evidence and adapt their theories, but unfortunately based on poorly explained or outright wrong conclusions. My hope is that things improve in the near future, so I put here my two cents.

  30. @FrankN

    I realised that Gaska’s comment was in Spanish, so you may have missed those stats I mentioned about Bronze Age male lineages in Iberia. Here’s what he posted about it (translated):

    “We have 105 quality male genomes from all the Bronze Age Iberian cultures, 99 are R1b-M269 (94.3 %) , 4 are I2a1a-P37 (3.80%) and 2 G2a2b-P303 (1.90%)”

  31. @Alberto

    The evidence isn’t strong on PIE’s relationship with other families. There are so many theories out there including Hurrian being the earliest split from Fournet and another one Dravidian being the closest language to PIE. I think it’s all nonsense.

    IMO, Burushaski being a sister language to PIE fits with your Iran-Turan hypothesis. Witzel claims the Vedic Aryans interacted with Burushaski speakers at some point but I’m pretty doubtful of that. I think it’s more likely that this language was brought across the IAMC from the many later migrations.

  32. I found a preprint from earlier this year that can be interesting for starting to look closer at the genesis of Balto-Slavic:

    North Pontic crossroads: Mobility in Ukraine from the Bronze Age to the early modern period

    In Figure 2 you can see the location and dates of the samples. They have two from the Vysotska Culture (somehow related to the Chernoles Culture, though also to Urnfield) dating from around 1100 BC. On the PCA (Fig 3. A) they plot where European Bronze Age ones would (CWC, Unetice,…) so they are probably the European clade of R1a, but they don’t have enough resolution to know (Table 1). Then there’s a Cimmerian from around 1000 BC which is clearly of Central Asian origin (from the PCA) and has Y haplogroup Q1b. It was the interactions between these two populations that produced the Balto-Slavic languages during the next few centuries.

    Then we have a group from Central Ukraine labelled as “Scythian right bank of Dnipro Illirian-Thracian basis” that date to around 700-600 BCE and are genetically also of European origin (Fig 3. B) and one of the samples has enough resolution to know it has the European type of R1a (R1a-Z283, Table 1). So these are “European” Scythians, and in the archaeological notes at the end of the paper they say:

    “This necropolis [Medvyn] belonged to the forest-steppe agricultural population, which preserved archaic burial traditions (decarnation through exposure to the elements and scavangers). […] Burials with a similar set of artefacts are found in the earlier dated kurgans of Saharna-1 burial ground (Cigleu) in forest-steppe Moldova. These facts allow to assume the movement of the population from Middle Transnistria (the oldest complexes) through Pobuzhzhia (Tyutky, Nemyriv, Vyshenka-2) to Porossia in the early Zhabotyn period. The migrants moved into regions sparsely populated by people of the late Chernolis culture, where mixing of different ethno-cultural groups occurred. The funeral rite and the set of moulded dishes indicate either the participation of the Chornolis-Zhabotyn population of Porossia in the genesis of this population, or the influence of migrants on the material culture.”

    So it seems these samples come from a population that moved there from further South-East (near Moldova) which was a more Scythian area. Still, they don’t seem to show genetic impact of Scythians, though the culture must have been strongly influenced by Scythians. It may be that these samples just spoke the same Indo-Iranian language as the Scythians, or maybe it was Balto-Slavic, who knows. But all these populations at this place and time are where one has to look for the origin of Balto-Slavic languages.

  33. Another detail from that same paper. They have some samples labelled as Thracian-Hallstatt dating to as early as before 900 BCE (the non-outlier radicarbon dated in Table 1, carrying a Y chromosome E1b-V13). From the archaeological notes:

    “The Thraco-Cimmerian culture still does not have the status of a distinct archaeological culture in historiography. It was described in 1920–1930 based on horse ammunition items from hoards belonging to the late Urnfield and early Hallstatt period in Central and Eastern Europe.”

    Genetically they are all quite “southern” for that region (border between Ukraine and Romania), quite clearly from further south (Fig 3. A).

  34. I made an update (just before the conclusions) about a significant change I saw in the sequence of samples that we have from Czechia. Between the EBA and the MBA, specifically c. 1600-1500 BC, between the Únětice Culture and the Tumulus Culture, there’s a significant genetic impact from the Balkans (estimated around 40% maybe, depending on the source) that persists through the IA. I had not noticed that before and didn’t expect to see such a clear change in such a brief period between the two cultures.

  35. Hmmm there might be something something going on in the Balkans.

    I remember there was someone claiming that there will be a few -2000BCE J2a samples with extra Anatolian ancestry in Eastern Hungary. There is still that BR2 sample with Arslantepe related marker but it’s nothing out of the ordinary in terms of the autosomal profile.

  36. @Al Bundy

    Well, I guess that if we refer to the linguistic role that the Andronovo people played in this, the most important thing would be that they were the ones who introduced the Indo-Iranian language to the steppe, which was spoken there for the next over 2000 years, and the one that helped shape the Balto-Slavic languages as we know them.

    I don’t know if that answers your question or you were referring to something else.

  37. Alberto,

    thx for the links to the Etruscan and the Bengtson paper. The Bengtson paper requires registration, so I haven’t read it. But I came across a more recent paper that he co-authored, which extends the family even a bit further: “Notes on some Pre-Greek words in relation to Euskaro-Caucasian (North Caucasian + Basque)” (including a couple of parallels also to Burushaki):
    https://www.degruyter.com/document/doi/10.1515/jlr-2021-191-210/html

    While some of the stuff in both papers clearly extends my linguistic competence, I find the evidence quite compelling. But note, however, that both papers suggest a relation that differs from your proposal (and that also both differ in their assessment of the role of Hurro-Urartian):

    1. Robertson [Etruscan] includes Hurro-Urartian in his family, but restricts the Caucasian part to East Caucasian. This makes sense to me from a morphological point of view, but differs of course from the Moscow School, which sees East and West Caucasian united, at least as concerns phonolgy and some basic vocabulary. Both, however, can IMO also be explained from a shared substrate, and/or language contact/ Sprachbund phenomena.
    He doesn’t regard Hurrian, East Caucasian and Tyrrhenian [Etruscan] as directly descending from each other, but rather as “cousins”, which all go back to an undocumented forefather spoken somewhere in Anatolia: “The relationship between Etruscan and the modern Nakh-Daghestanian languages is, while relatively distant, not a “remote” or “long-range” one, and might be compared in degree to the relationship between Latin and the modern Celtic languages. Just as Latin and Celtic had a common ancestor at a time depth of the order of 4000-5000 years before present (or rather, were at least adjacent, closely-related dialects of their earlier common ancestor), Etruscan and Nakh-Daghestanian became separated at about the same sort of time depth, or slightly more than 2000 years before Etruscan is first attested. The closeness of this latter relationship would be consistent with Proto-Tyrrhenian having separated from the rest of East Caucasian during the east to west wave of settlement across Anatolia which occurred as a consequence of a period of economic prosperity between the 4th and 3rd millennia BCE.” (P.2) O.k – I am unsure whether that theory can be aligned with the available East Caucasian, Anatolian and Italian aDNA from the EMBA, but that would be a secondary point.
    More important to me is his reasoning for the just indirect relation (p. 27). One point relates to the unusual high number of consonants in East (and also West) Caucasian, which isn’t found in Hurro-Urartian and Etruscan (and also not Basque, for that matter). Then, he points out: ” The most important key feature of Nakh-Daghestanian grammar which is not shared by Etruscan is class marking,
    and specialists in ND have traditionally regarded items of vocabulary which show class marking to be among the more ancient members of the ND lexicon. However, we have seen above that at least some instances of class marking show signs of having occurred as innovations. (..) It is completely lacking in Etruscan and its closest relatives and in Hurro-Urartian, and, unlike those Lezgian languages which do not now have class marking, neither Etruscan nor Hurrian/Urartian show signs of ever having had it. It is reasonable, on the basis of the balance of evidence so far, to suppose that Etruscan is more closely related to Nakh-Daghestanian than either of them are to Hurro-Urartian, and that hence the ancestor of Hurro-Urartian was the first to split from the common ancestor of Etruscan, Hurro-Urartian and Nakh-Daghestanian (which we could refer to as Proto-Alarodian for want of a better suggestion), followed by Proto-Tyrrhenian at some point thereafter. Proto-Tyrrhenian and Proto-Hurro-Urartian thus never formed a separate clade of their own. ProtoAlarodian also did not have class marking, and Proto-Nakh-Daghestanian, or all of its daughters, acquired it as an innovation at some later date, perhaps due to the influence of West Caucasian object markers or Akkadian personal pronouns.”

    Now, my understanding is that Basque also lacks class marking (by prefixes), but is instead exclusively suffixing, in a quite complex way – a feature it shares a/o with Hurrian. If so, the above argument possibly precludes your NEBA hypothesis. Albeit one may argue for a relatively late adoption of class marking in NEC, after NEBA languages had already split. Which still doesn’t explain the loss of various Caucasian consonants.
    [IMO, Robertson’s suggestion of class marking in NEC being adopted from NWC – and ultimately Hattic – makes quite some sense. Intriguing in this respect: https://www.quora.com/Is-there-any-evidence-to-the-claim-that-French-is-becoming-polysynthetic?no_redirect=1, comparing modern French grammar to West Caucasian Ubykh, thereby strengthening my point of a pre-Hattic role in EEF languages (continental route, ie. LBK etc.)]

    t.b.c: My Browser has stability problems. I post this before it gets lost.

  38. 2. Bengtson/ Leschber draw the relation somewhat differently. They include NWC in the family (of course, Bengtson has worked intensively together with Starostin and other members of the Moscow School), even though my impression is that the lexical parallels given relate almost exclusively to NEC languages. While occasionally also including Hurro-Urartian examples, they do not explicitly spell out that family’s relation to their proposal – maybe, because it wasn’t deemed relevant for their topic.
    On the genesis of the relation, FN2 spells out “I think the ancestors of the Basque people were the first European farmers, bringing agriculture from Asia Minor. The first wave went along the north Mediterranean coast and I would seek its traces in Greece and Italy, plus adjacent islands. The northernmost part of this wave was perhaps the Alpine region, where the tribal languages Rhaetic and Camunic were located, probably related with Etruscan. Till the present time there are traces of Basque-like toponyms and dialect words in Sardinia.”

    Among the arguments he has provided in earlier papers is the fact that he thinks to have identified multiple lexical parallels related to agriculture and animal husbandry, but non as concerns metallurgy, pointing to a split still in the LN. And, of course, we should then expect to find linguistic traces of that maritime (Cardial Pottery) EEF language not only in Basque, but also elsewhere in the Mediterranean. Etruscan (Robertson) would be one of that remnants. Another one, which the linked paper sets out on, is pre-Greek substrate.

    The paper concludes (p. 94): “It is important to emphasize that authentic Pre-Greek words, if they are of a more or less ‘basic’ nature, are not loans directly from North Caucasian (as framed by Nikolaev), but instead substratal remnants of a Euskaro-Caucasian language related to (Proto-)North Caucasian, but surely not identical with it.” In that sense, he agrees with Robertson, whereby the former provides a clearer linguistic reasoning. Bengtsen/ Leschber don’t need that reasoning, because their theory of a relation dating back to the LN precludes a direct link to the Caucasus.

    Intriguing here, OTOH, is that they find fossilized traces of [Caucasian] class markers in the Pre-Greek substrate, also in Basque, and the Pre-IE substrate in other Western IE languages (p. 87 ff). The latter includes most notably the ominous “a-mobile” described by Iversen & Kroonen, and also for the pre-IE substrate in Slavic alluded to in your linked Matasovic paper. E.g. (p. 88):

    “Latin merula ‘blackbird’ (< *mesl-) : Old High German amsala id. (< *a-msl-) : cf. (without a prefix) Basque *mosolo ‘(small) owl; buho, mochuelo’: mozolu, mozoilo, mosolo, (expressive) moxolo, motzollo id.; NC: Archi mus:al ‘wild turkey’, Chamali (dial.) mus:iya".

    If such traces would only be found in your NEBA horizon, your theory would be strengthened. But their presence also in Pre-Greek, and the general strong relation of Pre-Greek to "Euskaro-Caucasian" puts your theory in question. As I have said: I see [Pre-]Hattic at work here, contributing to the EEF language, and, via NWC, to NEC.

    As to Burushaski: Bengtson/ Leschber extend a couple of their Pre-Greek – Euskaro-Caucasian lexical parallels also to Burushaski. My rudimentary understanding, based on https://en.wikipedia.org/wiki/Burushaski is that it is morphologically quite different from IE. What stands out to me are the verbal complexes, with up to 11 slots, reminiscent of the more than 20 slots in NW Caucasian languages, or the 10 slots identified by Shryjver for Continental Celtic. In Burushaski, 4 slots come before, 6 after the verb. The most complex IE language verbal construction that I have an idea of (there may be other, more complex ones) is Latin, with 2 optional slots (negation, preposition/ preverb) before, and up to 3 slots (tense/modus, person/number, passive [optional]) after the verb stem. [Western] IE may have seen a reduction in verbal complexity. However, Shryjver rather argues for at least the Continental Gaulish verbal complex to reflect the influence of a Pre-IE substrate. Which – I am repeating myself – could be related to [Pre-]Hattic.

  39. Namaskar Alberto,

    Nice to see you posting after a long time. And what a wonderful invigorating read it was. I would request you to keep writing new posts. Furthermore I would request you to try and get an article published in some decent journal. You have a very important & valid point to make.

    I hadn’t really given much thought to your argument that Indo-European languages couldn’t possibly have come to Europe from the steppe. I wasn’t really sure what to make of it and had quickly forgotten all about it.

    Since then I have myself come to realise that Indo-Europeans couldn’t possibly have come to Europe from the steppe. It is only in the early 2nd millennium BCE after intense interaction and influence from Indo-European Near East that Indo-European elements seem to become visible in Europe. The influence of Minoans & Mycenaeans on the Carpathian Basin & Nordic Bronze Age is particularly stressed in this regard. Kristian Kristiansen has published stellar articles on this phenomenon. There was this one recent article from Rune Iverson that is also worth reading –

    https://www.researchgate.net/publication/381364698_Issues_with_the_steppe_hypothesis_An_archaeological_perspective_Iconography_mythology_and_language_in_Neolithic_and_Early_Bronze_Age_southern_Scandinavia

    It appears to be the case that even the 3rd millennium BCE steppe migration is too early for spread of Indo-European languages in Europe. The IE phenomenon spreads to Europe most likely from the Near East and it is post 2000 BCE. Robert Drews also has an excellent book on the subject –

    https://www.routledge.com/Militarism-and-the-Indo-Europeanizing-of-Europe/Drews/p/book/9780367886004?srsltid=AfmBOopOtHgCUuxlIv1NbtGGwWzFLx918uSPLK88yDBMguRcuiQshL4h

    It appears that Nordic Bronze Age was the incubator of Proto-Germanic while the Carpathian Basin may have been the incubator of Celtic & Italic branches and both these places were Indo-Europeanised under massive Near Eastern influence. Here is one important paper by Kristiansen on how Near Eastern warrior societies influenced & transformed ‘backwater’ Europe –

    https://www.academia.edu/124784901/The_Rise_of_Bronze_Age_Peripheries_and_the_Expansion_of_International_Trade_1950_1100_BC

    Another one by Vandkilde tries to explain how globalised the Eurasian trade network was during this period –

    https://www.academia.edu/35021739/Bronzization_The_Bronze_Age_as_pre_Modern_Globalization

  40. Frank,

    Yes, the possible relationship between any of those languages is a very difficult subject. If it’s difficult to say if NE Caucasian and NW Caucasian are related or not, just imagine how complicated it is to relate any of them to Basque or Etruscan. Even in the case of Hurro-Urartian (and those are two languages already to look at and compare, spanning a significant period) is difficult to say if it’s related to NE Caucasian or not. That’s why I said about Bengtson’s hypothesis about the Basque-N. Caucasian hypothesis that the evidence was weak. Even if the hypothesis is correct and the languages were indeed family related it would just be too difficult to know it with certainty at this point.

    That’s why I think it’s better to start from where we have the most solid evidence first, which in this case comes from ancient DNA, and work from there already knowing what you’re looking for, what is possible and what isn’t.

    Your linguistic insights are certainly very interesting and they have a value on their own. But for me the main problem comes from this “linguistic first” approach. You find yourself attributing a language from the Cardial Pottery people to a completely different population that came from the steppe and occupied the former territory of those Cardial Pottery populations.

    Given that Basques descend from a population that arrived to their approximate current location ca. 2400 BC and that they came from Central Europe (and ultimately from the steppe), how could their language come from the Neolithic farmers of Southern Europe? The Bell Beakers that settled the area didn’t get much admixture from the Neolithic communities around SW France and N Iberia. Maybe 5% from incorporating some females from those communities that, I’ll stress it again, disappeared shortly after.

    If the Neolithic communities of Europe that had survived the collapse and were still there by the time steppe communities arrived had been able to establish a good relationship with these incoming communities by trading, exchanging wives, cooperating with each other, etc… why would they have disappeared almost immediately after when the steppe communities themselves thrived at the same time and in the same places? The evidence clearly shows that whatever the way in which the steppe communities incorporated women from the Neolithic ones that they met as they repopulated N and W Europe, they didn’t establish any good relationships with the Neolithic communities where those women came from. At the least, they just excluded them from their own networks, at which point those isolated Neolithic communities would have small chances of surviving.

    So in this context, it seems difficult if not impossible that there was any linguistic influence in the language of the steppe people from their predecessors in the area, so to expect a complete language shift is completely unrealistic IMO. My NEBA languages hypothesis is indeed a hypothesis (obviously), but not like the older ones before ancient DNA that were mostly a “best guess” based on very inconclusive evidence. It’s one based on a very clear and unambiguous evidence. An evidence that makes it very, very difficult to argue that the CWH (by which I include the BBC to the west and all the other forest steppe cultures to the east) didn’t speak the same language. They came from a small core population and occupied a very large and mostly empty space. And due to their mobility they kept stronger network interactions that the previous Neolithic people did. So any of them shifting to a language (and keeping it for millennia) of a foreign, small population that went extinct immediately after is really hard to explain.

    Besides, if one want’s to propose that the steppe people of the CWH spoke an IE language, and that some of them shifted to a non-IE one, there’s also the problem of the complete absence of those IE languages. They all disappeared without traces, apparently? Italo-Celtic would anyway come ultimately from the Balkans (or at best Central-Eastern Europe) and reach W Europe in the IA. Balto-Slavic formed necessarily in the IA. So other than possibly Germanic all the other ones disappeared? And without leaving traces in the non-IE ones too?

    So to reiterate, I have to insist on the “ancient DNA first” approach, and then let’s look at the languages, because in this case this equates to an “evidence first” approach (given that genetic evidence is very strong and linguistic evidence is very weak).

  41. Hi Jaydeep, good to see you too after so long. Glad you’re still around following the developments of this fascinating topic.

    Indeed, the evidence is quite clear about IE languages in Europe being a very late introduction. And the evidence for the steppe people to have been PIE is not only non-existent, it’s that it really is incompatible with the evidence we have. Sadly, years pass and we keep seeing the same thing repeated without putting much thought into it other than trying harder each time to solve the problems in more complicated ways. It’s also sad to see the delay in the publication of ancient DNA from North India, since that would force for many people to rethink the whole thing again and maybe they’d finally come to terms with the actual evidence.

    Thanks for the links. I’ll read them as soon as I can and comment about them here.

  42. Alberto:

    1. “Given that Basques descend from a population that arrived to their approximate current location ca. 2400 BC..” Is that actually so? I mean, for the males you can probably say so. But from what I remember (correct me, when I am wrong), Basques and Sardinians are still the most EEF-like populations in modern Europe.

    Which takes us back to the point of if & under which conditions newly arriving males can effect language shift. And we anyway need to deal with non-demographic/ DNA-based explanations for the shifting (or non-shifting) to IE (lingua france, political/ cultural dominance etc.). Including in the case of Etruscan, which obviously could withstand demographic pressure for language shift, until the Romans broke their political dominance. [You could o/c argue for Etruscan being a NEBA phenomenon. But in that case I would expect it being closer to Aquitanian/ Basque, a/o because both had experienced similar language contacts (Continental Celtic, Latin, Punic/ Arabic). Asides, Etruscan aDNA doesn’t have enough “Steppe” to qualify as convincing case for a demographically effected language shift.]

    There are various more recent European examples where language and genes don’t match: Hungarian, the spread of Slavic across the Balkans and also parts of Russia, or the medieval Germanisation of lands east of Elbe/ Saale, including formation of Yiddish as West Germanic language. Even for SW Germany (your link above, good read btw.), Germanic immigration alone after the collapse of the Roman empire doesn’t suffice to explain the language shift. It needed a political factor, in the form of Frankish control, to make it happen. In Lombardy, with a comparable demographic shift, OTOH, Frankish control was too weak, so it remained Romance-speaking.
    Certainly, none of the above examples provides an immediately applicable model about why Basque might have withstood “Steppization”. But we are talking about a timescale (Copper Age/ EBA), when long-range trade networks and centrally-controlled structures emerged across Europe, including Iberia (El Argar). And the Basque Country sits on one of the land passages across which Cornish tin was transported to avoid ship passage around the Iberian peninsula (which derives its name from the Ebro, used for that trade). I don’t know enough about the EBA on the Gulf of Biscay for any qualified statement, but asides from matrilocality, issues related to controlling major trade lines, and possibly using language as advantage against competing networks (the Loire/ Rhone, Rhine/Rhone, Elbe/ Danube [Unetice] passages to the Mediterranean and Black Seas, respectively) might need to be considered as well.
    Last but not least, there has been an obvious founder effect for R1b males. This is traditionally explained by social factors (crowding-out of other males, e.g. via killing, raping, disenfrachising), but may as well relate to epidemics, e.g. the Plague. We start to get a better understanding of the role of the HLA genes in fighting various infections, including CoViD, albeit that understanding is far from being perfect. In any case, HLA genes are strongly homozytous, a/o regularly used for paternality tests. So, the “founder males” may just have had a genetic advantage against other males, and their current dominance has to do with genetic selection rather than inititially “overpowering” (also linguistically) the “native” population.
    [Intriguing here: There appears to be a strongly negative correlation between genetic resistance against the Plague, and CoViD. Regions poorly affected by the Black Death stand out negatively when it comes to CoViD, and vice versa. Points in case are Lombardy vs. Tuscany (the former with little documented death toll from the Black Death, but a high one from the Plague, the latter the opposite), Franconia vs NW Germany (dito, Tirschenreuth in E. Franconia was the second most heavily affected county in Germany by CoViD, while there lack historic reports of Black Death victims between Nuremberg and Prague, both major economic and political centers of that time). The Basque Country also seems to have fared the Black Death quite well – CoViD obviously not so.]

    2. “why would they [the Neolithic communities] have disappeared almost immediately after when the steppe communities themselves thrived (..)?”. Well, they didn’t. They just remained invisible to the aDNA record, for which reasons ever. I have provided examples above, including the Schönfelder Culture on the Upper Middle Elbe, which survived CWC & BB, towards the beginning of the EBA expanded a/o into Bohemia and Austria, and seems to have been one (of several) formative element in Unetice. Schönfelder was cremating, so no aDNA.

    For Unetice see also https://www.science.org/doi/10.1126/sciadv.abi6941 (Bohemia time transect -have you considered their data in your Unetice analysis?). Interesting there a/o, with respect to yDNA founder effects: “In addition to autosomal genetic changes through time, we observe a sharp reduction in Y-chromosomal diversity going from five different lineages in early CW to a dominant (single) lineage in late CW” – in a process of some 300-400 years. Might have been social, might also have been biological (disease-related) selection. In any case, the process was longer and possibly more complex than you seem to suggest.
    BB, btw, for all their genetic similarility to CW, subsequently effected the next complete shift in yDNA (from R1a to R1b), before early Unetice re-introduced Mesolithic yDNA (I, C) in substantial portians

    Here is another example, which I just came across by chance when trying to learn a bit more about the EBA in SEE – if you don’t know it yet, it is anyway good background for contemplating linguistic developments there:
    https://academic.oup.com/mbe/article/40/9/msad182/7240678

    “We report 21 ancient shotgun genomes from present-day Western Hungary, from previously understudied Late Copper Age Baden, and Bronze Age Somogyvár–Vinkovci, Kisapostag, and Encrusted Pottery archeological cultures (3,530–1,620 cal Bce). Our results indicate the presence of high steppe ancestry in the Somogyvár–Vinkovci culture. They were then replaced by the Kisapostag group, who exhibit an outstandingly high (up to ∼47%) Mesolithic hunter–gatherer ancestry, despite this component being thought to be highly diluted by the time of the Early Bronze Age. The Kisapostag population contributed the genetic basis for the succeeding community of the Encrusted Pottery culture.”

    Intriguing is not only the fact that by ca. 2.200 BC there had been an extremely HG-rich, apparently hardly Steppe-affected population around in CE that had been able to genetically “overpower” the preceding, Steppe-rich Somogyvár–Vinkovci culture. They also found that population to be a WHG/ EHG mix, with slightly more EHG, whereby the WHG component was reminiscent of the one found in FBC/GAC (->Schönfelder?), the EHG one to Ukraine EN. However, they constate that respective mix is so far undocumented in the aDNA record. Still: “Individuals with this ancestry predating Bk-II by only a few generations appeared in Czechia, Northern Hungary, Eastern Germany, and Western Poland, indicating that the Kisapostag-associated population probably came to Transdanubia via a northern route” (good map in the article). [They didn’t check the Wartberg samples in that respect – my hunch is that these would also have provided a decent fit, given that their WHG ancestry was quite “eastern” (Korös/ Iron Gates-like)]. Moreover, the HG profile (somewhat diluted, o/c) made its re-appearance in the LBA Tollense samples.

    The subsequent MBA Encrusted Pottery Culture then is genetically described as “dilution (..) driven by contact with various local populations, genetically best represented by later Transdanubian Hungary_LBA or Serbia_Mokrin_EBA_Maros”. The dilution is in the range of 25-30%, re-introducing yDNA R1b-ZZ103 (completely missing in the Kisapostag samples), but 3 of the 5 male Encrusted Pottery samples still have yDNA I2a-L1229.

    Here you go with your (interesting) observation of Balkans (Bulgarian) a DNA having introgressed into Czechia. They can’t have arrived along the Danube. In fact, as per https://en.wikipedia.org/wiki/Encrusted_Pottery_culture: “The Encrusted Pottery culture expanded eastwards and southwards along the Danube into parts of Croatia, Serbia, Romania and Bulgaria in response to migrations from the northwest by the Tumulus culture”. The path must have lead north of the Carpathians – and the Balkans remain puzzling to me, also linguistically.

  43. On a more general note: There isn’t any standard linguistic “birth rate” that universally defines the time after which a proto-language produces offspring, in the sense of individual daughter languages, and ultimately sub-families. Beyond the proverbial saying “A language is a dialect with a flag and a navy”, pointing out the political dimension, there even seems to lack widespread consensus where a dialect continuum ends and a language family starts [Sorry, Hungary – no navy! And greetings to Luxemburg – I find Letzebüergisch pretty hard to understand. The same applies to Schwyzerdütsch…].

    At least two factors seem to play a role when it comes to forming daughter languages/ families, and I have looked around a bit for respective benchmarks.

    The first factor relates to morphology: Semitic languages, possibly Afro-Asiatic as a whole (estimated time depth of 16.000 years, far longer than any other language family I am aware of), are known to be especially resilient to change, for their focus on 3-4 consonant roots. Point in case is Arabic, first documented around 400 BC, so by now some 2.400 years old, w/o having produced daughter languages. Well, in fact, there are now at least four distinct regional dialects, different enough from each other that Algerian films require subtitling to be shown in the Gulf states. While still being called “dialects”, technically we might regard them as daughter languages. Still, this would give us just one linguistic generation over 2.400 years, as a first “upper limit” benchmark.

    The second factor is the intensity of foreign language contact. Again, for an upper benchmark, Hawaiian, as example for Polynesian languages as a whole, with very little foreign language contact prior to European exploration/ colonalisation. It is believed to go back to Marquesan settlement maybe in the 4th century C.E., latest in the 6th century, with later settlement (9th cent. CE) from a/o Samoa. Acc, to https://en.wikipedia.org/wiki/Hawaiian_language: “Jack H. Ward (1962) conducted a study using basic words and short utterances to determine the level of comprehension between different Polynesian languages. The mutual intelligibility of Hawaiian was found to be 41.2% with Marquesan, 37.5% with Tahitian, 25.5% with Samoan and 6.4% with Tongan.” 41,2% mutual intelligibility with Marquesan is probably beyond the “dialect” stage, so Hawaiian is clearly a language in its own right. But from ca. 600 CE to 1896 (when English became the official language), we are talking 1.300 years. During that period Hawaiian has produced differing dialects on the various islands, but nothing that seems by any linguist being regarded as daughter language.

    On to IE: The classical benchmark, for a medium to high foreign contact scenario, is Old French. It was first documented in the Oaths of Strassburg 842. This was exactly 900 years after Cesar started the conquest of Gaul, so a nice and round benchmark. I call it “medium to high contact” scenario, because there had obviously been Continental Celtic substrate involved, Aquitanian/ Basque, the usual Roman mixing of populations, including oriental Jews, Germanic (Franks, Visigoths, coastal contact with Vikings/ Normans), remigrating Insular Celts into Brittany, plus at least throughpassing, partly also settling other migrating groups (Alans/ Ossetians, e.g. around Tours, possibly a few more Caucasians migrating with them).

    Lower Benchmark: Latin America, more than 500 years after Columbus, w/o forming own Romance languages. Certainly also medium to high foreign contact, that, in addition to native languages, may include West African slaves, and immigrants from outside Iberia – on top of Spain’s linguistic diversity exported there.

    Under these considerations, I am generally fine with Germanic having a 3-tier structure (family, sub-families, individual languages). The fourth tier, as now becoming apparent under High German with Schwyzerdütsch, Letzebüergisch and Modern High German is one to much. But is perfectly explainable, because actually High German split too early from West Germanic. And did so for a good reason, namely Romance speakers shifting to West Germanic (Shryjver has discussed this in great detail).
    With the common dating of Proto-Germanic to ca. 100 BC, the separation of North and West Germanic was actually also too early (and to me, North Germanic feels quite more separated from German than, e.g., Italian is from Spanish). The underlying reason should equally have been substantial language shift / acquiring substrate. The proccess has been ongoing in Norway when it comes to Saami, albeit the originally absorbed substrate might have been a complete different language.

    Then – Italic: The linguistic diversity is documented from the late IA / Roman republican period. And goes certainly beyond dialects. By the 6th cBC, there existed 3 clearly distinguishable families: Latin-Faliskan, Osco-Umbrian, and Venetic (sometimes considered a separate family inside IE). They have shared certain defining sound shifts, especially PIE “Gh”->”h” (“hortus”-“garden”) and “Bh”-“f/v” (*bhergh->Lat “for(h)tis”, German “Burg”, “frater” – “brother”), possibly under Etruscan influence. They also share the reversed sequence of qualifier (noun) and specifier (adjective), e.g. saying Villa Nova, instead of New Town. The latter is ascribed to Semitic (Punic) influence, and might just have occurred by the 7th or 6th cBC. Latin “hospes” (host, lit. “guest master”) preserved, fossilised, the standard IE sequence and attests the recency of the shift.
    Albeit it cannot be excluded that language contact with Etruscan and/or Punic has introduced specific changes in just one or two of the sub-families, the fact that all three have been affected by similar changes suggests that the internal differentiation has other reasons, reaching backwards longer. Some glotto-chronologists think of back to 4.500 years, which seems too long to me. But the Hegarty e.a. 2023 paper places Proto-Italian some time around 2.000 BC, which could make sense to me. If we assume development outside Italy, e.g. on the Balkans, and entry around 1.200 BC, with the transition from the Terramare to the Villanova Culture, the three groups/ families should already have entered fairly differentiated, at the transition from “somewhat, but hardly, mutually understandable dialects” to separated languages. Which seems unplausible to me. Alternatively, several authors consider two or more waves of Indo-Europeanisation, with Venetic possibly being the offspring of the latest (Villanova) wave. That doesn’t rule out entrance from the Balkans, with in that case another wave crossing over from Albania. Still, Italic remains a specific puzzle, that seems far from being solved to me.

    On to Celtic. First, two general remarks:
    1. Hallstatt was most certainly not Celtic, at least not exclusively. In all likelyhood, Celtic wasn’t even the dominating language. The name-giving village of Hallstatt is surrounded by Venetic inscriptions to the South (Gailtal, Carinthia) and the West (Ampass/ Tyrolia, a bit W from Innsbruck), and the antique Tergolape, a clearly Venetic name (c.f. Tergeste->Trieste, Opitergum->Oderzo), is currently associated with Schwanenstadt/ AT, 70 km to the north of Hallstatt. So, the chance is high that the people from Hallstatt proper spoke Venetic or something closely related to it. Hallstatt had substantial cultural influence from the NW Adriatic see, most likely transmitted via, and by Venetics. Intriguing in this respect is also, how the Czech/ Slowakian / Ukrainian “G”->”H” sound shift (“gora”->”hora”) mirrors the Italic shift described above. If you apply Venetic sound shifts to PIE *bʰérǵʰos (hill, mountain), you arrive at something like vrch[s] – which is a Czech word for hill or mountain (Proto-Slavic *verh, w/o accepted IE ethymology, if one excludes borrowing from Venetic).

    Having said that, modern archeology distinguishes between East and West Hallstatt. For good reasons – there is quite some evidence of military conflict. And the Ehrenbürg in Franconia, a densely settled Plateau with some 3.000 inhabitants, believed to have been a major centre of East Hallstatt, was completely destroyed and not resettled at the transition between (East) Hallstatt and La Tene.
    West Hallstatt, OTOH, may well have been Celtic. Unlike Bavaria, SW Germany is also where we find multiple Celtic toponyms ending on -dunum (“enclosure”, e.g. Cambodunum ->Kempten, Tarodunum-> Zarten n. Freiburg), or -briga (“hill, city”, Sarabriga -> Sarrebruck, Bregenz [<-Brigantium]. Hence, your linked article, dealing with SW Germany (West Hallstatt) may well be correct when addressing the inhabitants as Celts [but note, how careful they are with speaking just about West Hallstat, not Hallstatt in total].

    2. Some Greek historians (Herodotus?) stated that the Celtic homeland was near the source of the Danube, which has lead many historians to locate it in SW Germany. However, antique authors often selected different source rivers. Ptolemy, e.g., seems to, orographically correctly, have taken the Vltava as main source of the Elbe (and possibly the Oker as source of the Weser). If we take the Greek statement as meaning "near the source of the Inn", we end up near to the the lakes of Como and Lugano, where the oldest Celtic (Lepontic) inscriptions have been found.

    Homeland aside: By the 6th cBC, (Continental] Celtic was also already quite differentiated into at least Celtiberian, Gaulish, and possibly Lepontic (arguably just a dialect of Gaulish). The differentiation is explainable from different language contact, acc. to Shryjver in the case of Celtiberian, e.g., with Iberian language (and Lepontic with Raetic/ Etruscan). Therefore, Proto-Celtic doesn't require such a time depth as does Italic. Nevertheless, to develop into a distinct family requires a reasonably early differentiation, also geographically, from which closest relative (Italic, as per Shryjver, or Germanic, as per Heggarty e.a) ever. Heggarty e.a. place the split between Pre-Germanic and Pre-Celtic to the third mBC.

    This works with the "IE from the Steppe" theory, especially if you assume Pre-Germanic as Single Grave, Pre-Celtic as BB-derived. And the time depth might be shortened unter the assumption that both absorbed a fairly different linguistic substrate, which would have been a relatively "classic" continental EEF language in the case of BB, but heavily WHG-overformed (maybe even WHG-EHG-dominated) TRB/ GAC language for SGC. However, I fail to see how the Balkans could have brought forward such differentiation.

    I leave it here with IE ftb. Proto-Balto-Slavic is anyway a special case, Proto-Greek complex for interaction also with several Anatolian languages, and there seem to be other readers that are far more acquainted with Indo-Iranian than I am to take up my lines of thought in that area.

    But the key problem is: To explain the diversity of Western IE, you need quite some time depth, and the presence of fairly different linguistic substrates that are being absorbed. Ideally both, because language contact/ shift substantially reduces the time depth required for linguistic differentiation. Some 2-3 millenia just on the Balkans seem too short for that – and we need to look at ca. 500 BC as the period when Italic, Celtic and Pre-Germanic were already distinct sub-families, not just sister languages.
    Moreover, your NEBA hypothesis, as it is presented, would lead to incoming IEs encountering a linguistically fairly homogenous population. You are talking about some 1.500 years between CWH/NEBA arriving, and then switching to IE. Without intensive language contact (depopulated, females gradually absorbed, no surviving LN population around), we might at best assume one linguistic NEBA generation, i.e. split into a number of languages that are still pretty close to each other, partly still mutually understandable (think of South Slavic and a comparable time scale, for that matter). I can't see such linguistically homogeneous population bringing forward the differentiation between Celtic and Germanic – not even speaking of the specific features of Gaulish, which appear to live forth in modern French and make it such an outlier among Romance languages.

    You have addressed that problem yourself in a comment above, as "shallowness of the IE family in Europe". The only plausible solution to this problem IMO is that incoming IEs encountered fairly different substrates to interact with and ultimately absorb, including by language shift of a sizeable portion of "natives".

    Which calls, first of all, for the LN population to have already spoken fairly different languages. Various degrees of interaction with HGs (and, apparently, different kinds of HGs, from Spain-like to substantially EHG-enriched), plus linguistic differentiation already between incoming "island hopping" and "Danubian" ANFs, would have provided for such substantial differentiation of LN languages. But secondly, this differentiation needs to have survived until IEs started coming in (or effecting language shift in other ways). This precludes any "NEBA reset" – at least in the radical form proposed by you.

  44. Frank,

    You do raise a lot of interesting questions there (which, needless to say, is what I want and expect from the comments here), and it will take me a while to address all of them (and for some I won’t have any clear answer, I wish I could have answers for everything). So for now I’ll start from the beginning, your first point which is pointing at the core of this debate:

    “1. “Given that Basques descend from a population that arrived to their approximate current location ca. 2400 BC..” Is that actually so? I mean, for the males you can probably say so. But from what I remember (correct me, when I am wrong), Basques and Sardinians are still the most EEF-like populations in modern Europe.

    Which takes us back to the point of if & under which conditions newly arriving males can effect language shift.”

    There is a reason why I started this post talking about Western Europe on the claim of it being the easiest part, thanks to it being at the west end of the world back then and therefor free of complexities present in other places more at the centre, and also because of the quite extensive sampling available for the relevant periods.

    Since you reference there the males as the main drivers of the CWH/BBC migration, I really have to correct you about this. This was a migration of communities of people, LN/EBA people, shepherds, with children and grandparents, husbands and wives. Not just because we were still far away from the time where armies existed, but more importantly because the data shows us this. If the people from the steppe were migrating as groups of men and acquiring women from Neolithic (EEF) communities, their steppe ancestry would have diluted to almost nothing in a few generations (by 2500 BC they would be indistinguishable from EEFs except for their Y Chromosome).

    As for Basques being (other than Sardinians) the most EEF-like (in the autosomes) population from Europe, well, more or less that would be correct. Not by much, but yes. However, it’s important to understand the process as it happened. Olalde et al. 2019 put is this way:

    “We reveal sporadic contacts between Iberia and North Africa by ~2500 BCE and, by ~2000 BCE, the replacement of 40% of Iberia’s ancestry and nearly 100% of its Y-chromosomes by people with Steppe ancestry.”

    But note that the steppe populations what reached Iberia didn’t come from the Lower Don directly. They had to cross Europe over a few centuries. Before they left Central Europe and headed to Iberia they already had replaced 50% of Central Europe’s ancestry (here taken as the same thing as Iberian ancestry, i. e, EEF ancestry). All this 50% Central European ancestry present in those steppe populations from Central Europe was of Danubian origin, not Cardial. It was then when crossing France and reaching Iberia where they acquired an extra 10% of EEF admixture (this time of Cardial origin). Which would make the Iberian BBs at ~40% Yamnaya (that’s the 40% replacement in the study’s quote above) plus ~50% Danubian EEF (well, late Europea Farmer would be more appropriate, since they had more WHG ancestry by then) plus ~10% Cardial EEF (idem). This is why Basques closer to EEF, because the small amount of admixture they got from local communities was added on top of the large amount of admixture already acquired in Central-Eastern Europe. Even a 1% addition from local sources would make them closer to EEF. But that “added total” is not a measure for the likelihood of switching to an EEF language.

    This is why I already mentioned above that Iberia is not the right place to look for a language shift for the steppe populations. If they were going to shift to some EEFs language, they would have done so already in Central Europe where they had 50% EEF ancestry. Not from the 10% between France and Iberia. Whatever language those BBC folks spoke in Central Europe is the language they brought to Iberia, France and Britain. The case of Britain is even more clear than the one of Iberia, since we have no evidence even indirect of any presence of Neolithic communities surviving by the time the BB folk arrived. The reason why I don’t use Britain instead as an example is because there we don’t have attested languages until much later. We just have the specific substrate in Insular Celtic as a good clue, but that’s never going to be as good as actual written evidence of a language. And needless to say that if BB males migrated to Britain alone, they would have gone extinct in one generation (and if the island was not deserted and they were getting EEF females for reproduction, their 50% steppe admixture at arrival would have disappeared in less than two centuries).

    Then you question if (or rather deny that) the Neolithic communities actually disappeared, saying they just were invisible to ancient DNA record. But that’s clearly not correct at least for the most part (in most of the areas occupied by the CWH). In the case of Iberia and Britain, we have extensive sampling not just from the time of arrival of the BBC, but for the next 2000 years. The Neolithic communities were not invisible to the aDNA record or we would have seen them eventually. Or at least we would have seen them indirectly in the genes of the descendants of the steppe people. And why would settled agricultural communities of the late Neolithic, with duellings tools, pottery, etc… become invisible in the first place? The fact is that they indeed disappeared, and this point is crucial for what I’m trying to explain in the post.

    “Last but not least, there has been an obvious founder effect for R1b males.”

    Yes, but founder effects within different subclades of R1b only (in most of Western Europe). Since pretty much everyone was already R1b, any founder effect of any lineage would still be under the same typical western European R1b branch. IOW, the almost 100% R1b in Bronze Age Western Europe is not due to any founder effect. It’s just due to the people who repopulated the area being almost 100% R1b since they arrived.

    Yes, other places closer to contact zone areas (Carpathian Basin-Moravia/Bohemia, for example) are more complicated. Maybe other places did see a survival in exceptional cases of some Neolithic communities (you mention the Schönfeld Culture, which I don’t know any details about so I can’t comment much about it), but exceptions are just that. They don’t change the overall picture.

    I’ll have to read in some detail the papers you linked above to comment further on this last point. Especially interesting for me is the origin of those I2a haplogroups in the Unetice Culture. Back then the resolution was too low to know: not all I2a comes from Neolithic communities (ultimately from WHG), but some lineages were on the steppe and came from there. From a quick look at the paper you linked, those may have come from “Early CWC”, which would mean steppe in origin (should be expected), but I should check more carefully.

    I leave also more complicated points (like those about language diversity in European IE languages) for another reply too. That’s a complex topic as I’ve discussed in the blog before. Not only there’s no standard rate at which languages evolve, but even the direction of their evolution can be either way (“more time = more diversity” or “more time = less diversity”). In any case, good point to raise the issue about the difficulty to reconcile the shallowness of IE languages in Europe and their significant inter diversity.

  45. Before continuing I wanted to make sure my memory served me well regarding the steppe admixture in Etruscans. And I can confirm that they do have enough steppe for a convincing case. Yes, we can’t have almost 100% certainty that their language came from the CWH like in the case of Basque/Iberian, but it’s clearly a good possibility. Here you can see that they can be modelled as deriving 85% of their ancestry from Bell Beakers from Italy, while from the 25 Y Chr haplogroups available, 19 belong to R1b-L51+ (76%). For a run using directly Yamnaya as a source (not realistic, but to get an idea of how much actual steppe admixture) here it is showing some 29% ( I also checked with BB from Bavaria, and it was around 52%).

    That’s clearly more steppe than any population from the Balkans, let alone West Asia and beyond.

  46. @Alberto: Thx for the extensive replies.

    1. Etruscan may be a case for CWH/NEBA. Or equally, if representing a surviving EEF language, a case that Basque may also represent such a language. I am open to both options. But it certainly makes sense to explore the cases of Basque and Etruscan [Raetic] simultaneously in more depth.

    2. Britain: Albeit Patterson e.a. 2022 (https://pmc.ncbi.nlm.nih.gov/articles/PMC8889665/) report signs of the EEF share increasing already during the EBA, including in Scotland, relatively unaffected by MBA immigration from the Continent, I agree that the shift is minor, and Britain is indeed a case of virtually complete population replacement by “Steppe” immigrants.
    Which would make Insular Celtic a prime subject for identifying NEBA (“Vasconic”) substrate. While Vennermanns “Germania Vasconia” theory has received some review, and overwhelming rejection, when it comes to [West] Germanic, I am not aware of similar studies for Insular Celtic. This blog post https://euskerarenjatorria.eus/?p=38538&lang=en actually suggests Vasconic substrate in Insular Celtic. However, this may also stem from language contact, which certainly has existed especially with Ireland, and may have been quite intensive over the last 3-4 millenia. Anyway, the issue deserves further consideration and study.

    Some side notes:

    a.) The Orkneys (unsure, whether they technically even qualify as Britain) seem to have been an outlier. As per https://www.pnas.org/doi/10.1073/pnas.2108001119: “As elsewhere in Bronze Age Britain, much of the population displayed significant genome-wide ancestry deriving ultimately from the Pontic-Caspian Steppe. However, uniquely in northern and central Europe, most of the male lineages were inherited from the local Neolithic. This suggests that some male descendants of Neolithic Orkney may have remained distinct well into the Bronze Age.”
    Just a footnote to the overall picture, but maybe relevant when it comes to Pictish (possibly also the substrate in Irish). I also remember one paper co-authored by Shryjver that dealt with substrate in North Germanic and Saamii, and identified some non-IE substrate shared by Scotch-Gaelic and North Germanic [Old Norse]. Haven’t bookmarked the paper, will need to try finding it again.

    b.) Patterson e.a. 2022 report that “average EEF ancestry increased in North-Central Europe (Czech Republic/Slovakia/Germany)” during the EBA – actually significantly (from approx. 34% to 48% acc. to their Fig. 4). For the Netherlands, they report an increase from appr. 27% to 31%. Now, that may reflect EBA immigration from EEF-rich regions, e.g. the Balkans – albeit the Netherlands are quite remote from there. More likely IMO, however, is your NEBA hypothesis having been formulated too radically, ignoring sizeable pockets of non-Steppe-affected populations (in the case of NL e.g. the flint mines of La Spienne, and also Rijkholt, see my comments above).
    To the extent Davidsky has processed the Patterson 2022 data, it might make sense to explore it further. Whereby I would more focus on the HG than the EEF element – differentiation between La Goyet/ El Miron, Villabruna/Loschbaur, KO1/Iron Gates and EHG (UA EN) seems to be the best way to distinguish local admixture from immigration, and eventually identify the direction of immigration, if there had been any.

    c.) Patterson e.a. also report substantial additional Steppe introgression into Iberia between the CA and the MBA (EEF down from 64% to 59%), w/o further discussion/ analysis.

    3. The above observation takes me to Iberia. Let me concede in advance that I have stopped following intensively a DNA studies in 2020, and may not be up to date on recent findings. Your guidance is certainly appreciated in this respect. Nevertheless, I am aware of the Olalde papers, and also Villalba-Mouco e.a. 2021 (https://pmc.ncbi.nlm.nih.gov/articles/PMC8597998/). My take-aways from them are:

    a.) While Steppe ancestry in Iberia arrived during the CA (BB-period), it wasn’t until the EBA, i.e. after 2.200 BC, that specific R1b lineages assumed dominant position. Which means that we have to consider some kind of “founder effect”, or maybe “survivor effect”, after the apparent socio-economic crisis and turnover that hit at least S. Iberia around 2.200 BC (discussed in more detail in Villalba-Mouco e.a. 2021, with possible reasons relating a/o to the 4.2k climate effect well documented for the E. Mediterranean, maybe also epidemics [Plague], and unsustainable land use).

    b.) The “Steppization” process appears to have been gradual, over time, and on-going during the EBA (compare Patterson e.a. 2021). If so, it would have increased the likelyhood of the immigrants having been absorbed w/o language shift.

    3. At least in S. Iberia (El Argar), there is little indication for “Steppization” by immigration of “shepherds, with children and grandparents, husbands and wives”, as you have put it. In fact, Villalba-Mouco e.a. 2021 report “rejection of models involving Germany_Bell_Beaker + C_Iberia_CA. (..) Notably, Bastida_Argar also failed for the distal model.” Instead, they propose an Iran-N-enriched source: “However, these three groups returned values ≥0.05 in the proximal local CA substrate model when Iran_N was added as a third source”. While failing to identify the specific origin of that source, possibly because of undersampling of the source regions in question, they state that “adding a central Mediterranean population to the outgroups (Sicily_EBA, Greece_EBA, or Greece_MBA) decreases the model support (P values) for Almoloya_Argar_Early and Almoloya_Argar_Late, indirectly attesting to the importance of central Mediterranean BA.”
    My takeaway is that we need to consider a “Steppization” component from the central/ eastern Mediterranean, most likely (via) Sicily, which would at least complicate, albeit not necessarily invalidate, your CWH/ NEBA idea.

    Having said that: I concede again to not have followed recent studies in detail, and appreciate any guidance from you, especially as concerns your postulated immigration of “shepherds, with children and grandparents, husbands and wives.” If such immigration can be confirmed from aDNA, it is certainly a weighty argument, albeit one may still question the linguistic impact of sheep-herders vs. elites controlling tin trade from Britain to the Mediterranean, or gold and silver mining in Andalucia (El Argrar).

  47. I forgot the most important note on Patterson 2021: The absence of significant IA migration into Britain makes its Celticisation difficult to explain.We need to go back to the MLBA, starting ca. 1.500 BC, and have to assume that the migrants into Britain, most likely and/or in majority from NE France, already spoke Celtic or some kind of Para-Celtic (“Nordwestblock”). Which, under your NEBA theory, would push back IE-isation of NE France and in consequence Central Europe accordingly, i.e. into Tumulus (at latest) instead of Urnfield.

  48. Frank,

    Let me first also admit to not have followed too closely the studies from 2020 onwards. I just followed them distantly, and missed quite a few. I had to catch up a bit while writing the post and still doing it after publishing it. My feeling during these years was that what was coming out didn’t significantly change the picture, though.

    1. Britain: Clearly Britain is the strongest case of population replacement. If the steppe populations found any surviving neolithic groups they were really very few and basically undetectable in the genomes of the BB groups.

    For the Insular Celtic substrate, see the https://www.degruyter.com/document/doi/10.31826/jlr-2012-080111/html. But here I should reiterate my opinion about substrates in N and W Europe being clearly biased due to a very large portion of them being considered as IE just because they appear all over Europe (I’ll mention again that 50% pseudo-IE substrate in Catalonia, which must actually be Iberian, and is shared all over the continent). Until this is rectified, it will be very difficult to make sense of the substrates with just looking at those not shared between different regions.

    2. The paper about Bohemia you linked above has some very interesting samples regarding this discussion. For example, it has some Corded Ware samples that have no steppe ancestry, and as you can probably guess all of them are females. Three are from the site Vliněves (VLI008, VLI009 and VLI079) and one from Stadice (STD003), with dates respectively of 2894-2703 calBC, 2850-2497 calBC, 2853-2503 cal BC and 3010-2889 cal BC. So here we have the direct evidence of females being incorporated from Neolithic communities (in this case, I guess some late Globular Amphora communities, since it’s always been Globular Amphora samples the best fit for the EEF admixture in CWC groups).

    The paper also features several CWC samples that basically have no EEF admixture (KON005, modelled as 98.2% Yamnaya in table S9, dated 2868-2586 calBC, OBR003, 93.5% Yamnaya, 2911-2875 calBC, VLI076, 92.5% Yamnaya, 3018-2901 calBC), the latter one a female, with many others above 80% Yamnaya, with several females too, showing that this area was of particular importance when it come to the early interactions between steppe groups and Neolithic ones. It’s around this area extending further west to Bavaria (still a relatively small zone) where the CWH people went from almost 0% EEF to around 50% EEF, and therefor the one that would matter most when it comes to any linguistic influence/shift (though I’d still say, very unlikely given the type of interactions).

    Interesting too are the early CWC samples carrying R1b-M269, and R1b-L151 which later became the BBC marker, showing that there cannot be further doubts about both cultures being the same people.

    When it comes to Únětice, what the data shows is that it started with a significant input coming from the NE, bringing higher steppe admixture (compared to the late BBC preceding it) and different male lineages. They also don’t have any high resolution when it comes to the I2a lineages, but it seems incompatible with the area and genetic profile of the incoming population that those I2a could come from EEF populations.

    I’ll elaborate on other of your comments when I can get more time, but as a quick note, the very high steppe females from the samples in the Bohemia paper provide direct evidence of females being part of these steppe migrations. But it’s not that we really need that direct evidence. If I can get enough time I’ll try to show it with numbers (and maybe in Iberia if that’s a better place), but I can confidently say that a male migration (or largely male) is incompatible with the samples we have. These were communities of people, families, moving together. For example, from the Villalba-Mouco et al. 2021 paper:

    “This observation suggests a substantial amount of steppe-related ancestry in El Argar BA individuals, which we tested formally and directly with f4-statistics of the form f4(Argar_Iberia_BA/SE_Iberia_BA, SE_Iberia_CA; Yamnaya_Samara, Mbuti) (fig. S5A and table S2.7). Significantly positive f4-values confirmed the presence of steppe-related ancestry in all BA individuals. We then tested for differences in affinity to steppe-related ancestry by contrasting northern versus southern BA individuals using f4(N/NE/C_Iberia_BA, Argar_Iberia_BA/SE_Iberia_BA; Yamnaya_Samara, Mbuti) (fig. S5B and table S2.8). The resulting f4-values confirmed a smaller amount of steppe-related ancestry in individuals from the Argaric sites La Almoloya and La Bastida compared to the rest of Iberia_BA groups, especially when compared to those from northern Iberia (fig. S5B and table S2.8), despite the complete turnover to lineage R1b-P312 (except for one subadult male in La Bastida) visible in the Y-chromosome record (Fig. 3B, table S2.6, and text S8). However, at the intrasite level, we observe no significant differences with respect to the amount of steppe-related ancestry between the early and late phase of La Almoloya and La Bastida based on PCA and formal f4-statistics (Fig. 3A and fig. S5), which suggests that the contribution is homogenized across the population.”

    As expected, a small difference in steppe ancestry between north and south Iberia (that’s already a larger area than the “core” Bohemia-Bavaria one mentioned above) due to still incorporating a small amount of females from Iberia itself. There’s not too much in it, though. In Table S2.13, the test several models for different BA populations of Iberia as a mixture of C_Iberia_CA_Stp (these seem to be the samples with highest steppe) + something else. They get best fits with adding Iran_Ganj_Dareh_Neolithic as a second source, but notice that it’s with negative values of it. The best fits without negative amounts is with Jordan_PPNB. As such, NE_Iberia_BA is modelled as 94.6% C_Iberia_CA_Stp + 5.4% Jordan_PPNB. For N_Iberia_BA it’s 91.2% and 8.8% respectively. And for SE_Iberia_BA_Argar (I assume this is the average of all the sites of El Argar and all timeframes) it’s at 86.8% and 13.2% respectively.

    I’d still agree with your impression of my too radical formulation of the dynamics of what happened with the CWH populations and their replacement of former neolithic people. But that’s from the point of view of someone into ancient DNA genetics. The reason I didn’t go into small details and nuances was for the sake of simplicity. I honestly don’t think that those nuances are enough to have a real impact on the outcome, so when writing for linguists and needing to cut the technical information of the post I just summarised it in a possibly too radical way for some tastes. I hope your comments are helping to give a broader perspective on the subject.

    More as I get time for it…

  49. Sorry, correction about those models from Villalba-Mouco et al. I mixed up the distance with the P-Value, so since they’re giving the P-Value higher is better, not lower as I was taking it. Looking at the best models, which are often two-way mixture of C_Iberia_CA_Stp + C_Iberia_CA (i.e, local steppe + local pre-steppe), for NE_Iberia_BA the local steppe source peaks at 75%. In central Iberia is down to 56-58%. And in El Argar (average) is at 42.6%. For the latter, adding a third source with a good model works with Ganj_Dareh_Neolithic and Jordan_PPNB, but both with negative percentages. However, for some specific sites from El Argar it does work well with positive values for Ganj_Dareh_Neolithic. For example, La Bastida works best with 7.2% Iran Neolithic.

    In general, it’s not surprising that the SE part of Iberia was able to incorporate the largest amount of Neolithic females, given that it was that area the one that was the most populated and lasted longer as the Los Millares Culture. Neither it’s surprising that it had some contacts with the eastern Mediterranean. But still, if one wanted to argue that Iberian came from Los Millares (which again, is very unlikely to happen by just taking some women from it), it could work for Iberian, but it would fail to explain why Aquitaine in SW France also spoke a related language. And regarding the eastern Mediterranean pretty much the same. So overall, interesting details, but they don’t change the picture.

  50. Jaydeep,

    Reading that paper by Rune Iversen was quite interesting. He argues that the CWC/BBC that first came from the steppe could not have introduced the IE languages based on the lack of concepts and features that are part of the PIE language. Where I disagree is with his solution to the problem:

    “To adopt new words into a language that describes concepts and features unknown to its speakers seems to go against the paleolinguistic method. These concepts, together with signs of Indo-European mythology, first appeared in the Early Bronze Age, period IB/II, c.1600/1500 BC (i.e. c. 1200-1300 years after the Single Grave culture and the supposed introduction of Indo-European). Hence, we must expect at least a “second round” of influences from the steppes introducing new words (originating in Proto-Indo-European vocabulary) together with new features such as woollen clothes, domesticated horses, spoke-wheeled chariots and figurative mythologically loaded iconography. A driver for this development could be the Sintashta chieftains.”

    While Sintashta (via its descendant culture that moved to the west, Srubnaya) could be responsible for providing the horses and chariots, I can’t see how the figurative art could be related to either of them. While Sintashta had already a bit of influence from Turan (David Anthony elaborates on it) it was small, and looking at the material culture of the Srubnaya Culture it still looks very crude (especially comparing it to that of the Scythians that succeeded it is night and day). But Scythians are too late to have brought these changes to Denmark. We must look at SE Europe for that. Besides, it doesn’t really make sense that the CWC was non-IE but then Sintashta was IE, since they were the same people. So again, we need to look at SE Europe for a source for the language if one wants to argue IE arrived at such date (1600-1500 BC).

  51. Frank,

    Re: the arrival of Celtic to Britain, the new paper just published is more informative than the old one you mention (Patterson 2021). Here’s the link again: https://www.nature.com/articles/s41586-024-08409-6

    In general, I think that by the middle Bronze Age the elites and networks associated to them started to be solidly established, and much more by the IA. Since the expansion of Celtic didn’t occur in a similar way to that of Latin (with a centralised system), it’s likely that the “Celtic package” (iron included) spread through these networks already established, and specially through its elites. In cases where elites where in conflict with each other it’s likely that there was a take over. But in cases where they were allies, the package was probably just transferred along. The question of how did the language spread with this package is open to debate, but it could have been adopted by the elites first as a form of prestige language and for communication, and then been gradually adopted by the rest of the population. Ancient DNA won’t show big movements, I’m afraid. So we’ll have to figure things out by looking at the details (sometimes genetic, sometimes archaeological).

»


Leave a Reply

Your email address will not be published. Required fields are marked *