Origins and spread of Indo-European languages: an alternative view

158 Comments

After over 5 years of being away without officially leaving, I’ve finally got around to write a closing post for this blog. And unsurprisingly, it deals with the Indo-European (IE) question which has been the main focus of ancient DNA studies and the subject that has brought the most interest to the people who followed them. I thought I’d never have to write this post, since back then when I stopped writing things were already clear enough and it should have been a matter of months that the mainstream publications would have written what I’m going to do now. However, 5 years down the line this is still pending and many linguists have been trusting that the current mainstream view is essentially proved and that they had to adapt their theories to those findings. When I found out this some 2-3 years ago is when I first thought to write a final post about the subject (but it’s taken a while to actually do it), and that brings me to the main purpose of this post: It’s written mostly for linguists working on the history of languages, not for people interested in ancient DNA studies. This is because the latter can already make up their own minds about the interpretation of the findings, while the former are largely dependent on the interpretation and conclusions that they are given by those writing the studies. And I think they deserve to have an alternative view before they go too far in changing their works just because they may not fit well into what they’ve been given as proven facts about the origin and spread of IE languages.

Since this is quite a big subject, I’ll be looking at each geographical area (from the west to the east) and cover the basic evidence we have from each of them before going on to look at the whole picture and a final summary. And ultimately, as it’s been the case with all the previous posts on this blog, it will be in the comments section where there will be further discussions and details about all of the things mentioned in the post, so stay tuned for those and feel free to participate in them with any questions or thoughts so that we can all get a better understanding of the data available and its interpretation.

Europe

We’ll start with this area of the IE speaking world which is probably the easiest to understand, but unfortunately the ancient DNA studies have not been able to explain it the way that both historians and linguists would require in order to understand it in a way that is useful for them. Here we’ll try to address that issue and explain the historical reality of the region on a population basis.

The Upper Paleolithic in Europe has been explained well enough, basically showing the discontinuities between different periods (Aurignacian, Gravettian, Magdalenian…). The old idea that the earliest Anatomically Modern Humans (AMH) that populated Europe are the ones from who modern Europeans mostly descend from (often Basques being cited as the most direct descendants of the Cro-Magnon people) has been thoroughly disproved. All of those early populations went extinct, and the last one to arrive did so probably just before the Last Glacial Maximum (LGM) some 25-22 thousand years ago (likely from Anatolia) and was not associated at the time with any specific culture until the Epi-Gravettian from after the LGM, mostly from Italy. It was this population the one that inhabited Europe during the Mesolithic period and have been named as Western Hunter-Gatherers (WHG).

With the advent of the Neolithic a new population started to colonise Europe (again, from Anatolia, though this time the origin is proved and not just likely) bringing farming with them. They are know as the Early European Farmers (EEF) or, sometimes, Anatolian Farmers. Their expansion throughout Europe was slow (as these were sedentary populations) but they eventually populated most of Europe replacing the Mesolithic WHG populations that preceded them. However, throughout this long period and as they further ventured deeper into Europe, the farmers did get increasing levels of admixture from those WHG populations reaching levels of up to 25% of WHG genes in their Anatolian Farmers’ genomes.

However, by the end of the Neolithic another big event happened at the population level in Europe, and it is the most crucial one for the purposes of this post and the one that has not been very well explained so far.

Depopulation and Repopulation of Northern and Western Europe

Some 10 years ago, genetic studies started to appear (Haak et al. 2015 was the first of them) showing a surprisingly large migration from the Eurasian steppe into Europe at the end of the Neolithic (starting c. 3000 BC) that changed the genetics of the European populations to formed the basis of what modern European are. These steppe migrations were associated with the Corded Ware Culture (CWC) in Northern to Central Europe, and with the Bell Beaker Culture in Western Europe. It was said that their genetic impact was roughly 50% across Northern Europe, going down to some ~30% towards Iberia. It was also stated that there was a male bias in this genetic impact, given that the Y Chromosome (passed from fathers to sons) from Europeans turned to be of a steppe origin while the mitochondrial DNA (passed from mothers to sons and daughters) was largely of Neolithic origin. Somehow, our (modern European’s) fathers came from the steppe and our mothers from Anatolia. It was also speculated the reason for this could be due to the steppe people bringing some pathogens with them that may have impacted severely the Neolithic populations they came across, with some strain of Yersinia pestis (the cause of the plague) found in steppe samples being the main suspect (link, link). This also was in line with other genetic studies (link) that had shown a big bottleneck in populations across Europe (particularly all across northern and western Europe, with Italy and the Balkans seeing a smaller one and Greece not seeing it at all, see Figure 2) at the end of the Neolithic as well as rapid expansion of a few paternal lineages (link).

Now, this may be correct (or mostly), but it fails to provide an explanation of what happened that can be easily understood by anyone confronting this information. We have to take a step back, focusing not so much on the genes but rather on the people (the communities of people) who carried those genes so that we can make better sense out of it.

During the Neolithic period, communities of people from Anatolia started to settle in Europe, advancing slowly until they occupied the majority of the European territory. They had a distinct genetic profile when compared to the WHG that lived in Europe before their arrival. This applies both to the autosomes (basically their whole genome) as well as their uniparental markers (the Y Chromosome for the paternal ones and the Mitochondrial DNA for the maternal ones). The most prevalent paternal lineages were the ones under the G2a branch. WHG, on the other hand, had most of their paternal lineages under the I2a branch. Minor paternal lineages in both populations didn’t overlap either, at least initially. However, slowly along the 4000 years between ~7000 BC and ~3000 BC, the farming communities admixed occasionally with the hunter-gatherers, which resulted in acquiring genome-wide signatures of WHG (very low in the Balkans, but increasing towards central, northern and western Europe, to around 25%) as well as uniparental markers. Interestingly, the WHG paternal lineage I2a once it entered the farmer’s gene pool, it rose in frequency to the point that by the end of the Neolithic it had become the most common one among farmers, relegating their original G2a to a second place. This pattern usually points to some sort of selection, though in this case the reason is unclear (and for the purposes of this post, irrelevant anyway).

Then around 3000 BC something happened throughout Europe, affecting specially all of the northern and western parts of it, and causing a big population collapse. The reasons for this are unknown – could be a change in the climate (the end of the warm period known as the Holocene Climate Optimum, that triggered another series of events that could include hunger due to the lack of crops, disease, increase of violent conflicts, etc…), but once again the reason is not really relevant for the purpose of this post. Suffice to say that the Neolithic population across Europe got severely decimated, with many areas becoming completely depopulated.

Meanwhile, in the North Pontic steppe small populations of pastoralists had started to thrive with their mobile economy that was not based on crops, but instead on animal husbandry. These types of populations have proven to be more resilient to the sort of changes that greatly affect the larger, more densely packed and sedentary ones that rely heavily on crops. They’re also much more mobile and can occupy the territory much faster if the conditions allow for it. And this is basically what they did when the Neolithic communities from Europe collapsed.

These steppe populations had originated probably in the North Caspian shores (maybe when they started to have domesticates in the mid-late 6th millennium BC, if not earlier) but it was not until they moved west to the North Pontic region and the conditions allowed for it (the invention of wheeled vehicles ca. 3500 BC seems to have been a crucial factor, though pulled by oxen, since they didn’t have horses as once believed) that they started to expand very successfully. The initial separation into the two main groups may have happened around that time (mid 4th mill. from some North Pontic culture like the Lower Mikhaylovka groups) to end up forming the Yamnaya Culture and the Corded Ware Culture (CWC) with the former occupying most of the steppe (specially if we include the very closely related Afanasievo Culture) and the latter expanding into what I will refer to as the Corded Ware Horizon (CWH) which would include the Bell Beaker Culture (BBC) to the west and the forest steppe cultures of the time (Fatyanovo-Balanovo, Abashevo, Sintashta, Andronovo) to the east, covering an extremely vast territory that went from Western Europe to Southern Siberia by the end of the 3rd mill. For now we’ll turn our attention to this CWH group.

The CWH people separated from the other main steppe population by heading north and leaving the steppe for the forest steppe. While the exact place and time of their initial steps is not known so far, we do know that they started to expand ca. 3000 BC reaching the Baltic Sea and moving west to Central Europe where they appear around 2800 BC. During this initial expansion, they encountered a few areas where the Neolithic populations had not died out completely. And the reason why we know this is because the steppe populations started to show admixture from those Neolithic Farmers (probably from those left from the Globular Amphora Culture) and we know that this admixture came from incorporating EEF females into their communities. We don’t know the details of how these “foreign” females were incorporated (could be from peaceful agreements, could be by force, we don’t know), nor their exact status in these steppe communities. But we do know that their offspring must have had the exact same status as the rest of the people in the community, since there’s no genetic difference across these communities, where the “foreign” genes were spreading equally among the whole community. Population growth must have been a priority for these small steppe communities, probably because the conditions they were finding allowed (and maybe demanded) such growth. They were successfully populating vast, largely depopulated areas that they could exploit and it seems that whenever they had the chance to incorporate females from the few Neolithic communities they found along the way, they did so in order to increase the growth rate. Males, on the other hand, didn’t seem to have been welcome, probably due to the patriarchal nature of these steppe people that organised themselves in family clans (much like the late Neolithic farmers from Europe did too). The evidence for this dynamic of incorporating females but not males is very clearly seen by looking at the uniparental markers, where we do see European Neolithic haplogroups in their mitochondrial DNA, but not a single European Neolichic haplogroup in their Y chromosome, and then at their autosomes which show the genome-wide admixture they were getting via these females.

By the time they reached Central Europe around 2800 BC, the CWH people had around 30% admixture from the Neolithic farmers. Quite a significant amount, but not surprising given how fast a small population can change genetically when they start incorporating “foreign” genes into their pool. Then during their stay in Central Europe, this admixture increased to around 50% by ca. 2500 BC (which means that they still found some Neolithic communities that survived there and from which they could incorporate females). However, one may wonder what was happening in those Neolithic communities meanwhile. That’s something we don’t really know. We don’t have a single sample in the ancient DNA record from the Neolithic communities from the periods just before, during or after the arrival of the steppe communities. The only evidence we have that some of them survived the collapse comes precisely from the admixture that we see in the steppe communities that were occupying their former territories. So, essentially, a few of the Neolithic communities lived just long enough to see the steppe ones arriving and acquiring females from their communities before they died out completely (we don’t know if it was this “borrowing” of females from the steppe groups what precipitated their final extinction, though that’s a possibility even if the “borrowing” of females didn’t imply any violence).

Then from Central Europe, around 2500 BC, these steppe communities continued their expansion to Western Europe. We know the communities that did so as the Bell Beaker Culture (BBC), but they were the same people. Curiously, this expansion to Western Europe started from a very small clan within the CWC people. And we know this because they had a Y Chromosome haplogroup that was very rare among the CWC (the vast majority of males from the CWC had a subclade of the R1a branch, while the males from the BBC had one from the R1b branch). This also stresses how small the initial population that repopulated Western Europe must have been. Essentially a small family clan that once they settled in Central Europe started to be successful and then went on to occupy the whole of Western Europe. This, again, was facilitated by the fact that most of Western Europe had become almost completely depopulated. For example, the BBC people who colonised the British Islands were genetically identical to how they were already in Central Europe. In other words, on their way to the Islands and on the Islands themselves, they didn’t seem to have found any females from surviving Neolithic groups to incorporate into their own communities and grow faster. Here, both the lack of direct evidence of any surviving Neolithic community as well as the indirect one from no traces of admixture in the steppe populations that moved across that territory indicate that it was almost completely (if not completely) depopulated.

On their way to the Iberian peninsula and in the peninsula itself, however, they did find some surviving Neolithic communities as again we see further admixture coming from the “foreign” females they were incorporating into their own communities. By the time they had settled the Iberian peninsula, this admixture had increased to around 70%. But again, we have no direct evidence of these surviving Neolithic communities from the time when the steppe people arrived. It’s just the indirect one (in the form of admixture in steppe populations) that allows us to know that they must have been there, even if it was quite shortly once the steppe people arrived (if the arrival of the steppe people is what precipitated their extinction is something that, once more, we don’t really know – but it seems plausible).

What all the process described in the above paragraphs basically means is that Northern and Western Europe were completely (re)populated by people who came from the steppe. By communities, clans, of people that came from the steppe. This was not a 50% replacement of the previous Neolithic population. It was a 100% replacement. Every single Neolithic community died out before or at the time the steppe communities arrived. The fact that some (or many) of the genes from EEF survived (through those females that were incorporated into the steppe communities and passed their genes along) does not have any historical (and therefor linguistic) relevance. The people, the communities of people with their culture and language, that populated all of these parts of Europe were originally from the steppe. All of them. We don’t have evidence of even a single exception. The paternal lineages from the Neolithic people disappeared simply because the Neolithic communities of people disappeared.

Thus, after this expansion throughout the 3rd mill., we have the CWH people all the way from Western Europe to the Altai Mountains of South Siberia. And they were the sole occupants of all that area. Basically a big family very closely related to each other and without any discontinuity in their occupied territory. Which means, clearly,  that they all spoke the same language, and probably that the divergence between the language spoken by someone in Ireland or Iberia (BBC) and someone in Southern Siberia (Sintashta-Andronovo cultures) ca. 2000 BC was not very large. Which takes us to the next question about which language was that.

Mainstream studies have been suggesting that the CWC must have spoken something they called “Indo-Slavic”, i.e, and Indo-European language from which both Balto-Slavic and Indo-Iranian languages descended from. But that would imply that such language was also spoken throughout Western Europe, something that we don’t have any evidence of whatsoever. Moreover, it would imply that Celtic and Italic would be descendants of Indo-Slavic, something that is at odds with basic linguistics.

Therefor, it would be better to suggest that they spoke an older form on Indo-European language from which all others descended from (except the Anatolian branch, and maybe Tocharian). The only problem is that not only we don’t have any evidence to support this, but that all the evidence we have contradicts this idea.

To examine this, we should start from the easiest place: The Iberian peninsula, where we have earlier evidence of languages that in the rest of the territory of the CWH, and that being rather isolated in the far west is free of confounding factors. And when we look at the earliest languages known from there, we see that the languages spoken that were not replaced by the recent (at the time of the recorded languages) Celtic expansion were non-Indo-European. I already wrote a few years back some insights about the languages of Iberia, looking at the relationship between Basque and Iberian, as well as to the substrates. There I presented some of the latest linguistic research (which was, and probably still is, only available in Spanish) showing the shift in the paradigm that used to consider the relationship between Basque and Iberian a sort of a legend to become the most accepted idea that they are indeed family related. I also explained that one of the obstacles that this possible relationship had to overcome was the believe that the Basque and the Iberian people were completely unrelated, with Basques being descendants of the first European AMHs and Iberians being a Mediterranean population. This problem is not only solved now, but the fact that we now know that Basques and Iberians were the exact same people who arrived shortly after 2500 BC and settled the whole peninsula (without any of the Neolithic populations that lived there before surviving) actually makes it almost impossible to argue that they could speak different (unrelated) languages. This is one of those cases where ancient DNA has come at the right time to confirm without a doubt the recent (and at the time slightly controversial) linguistic research. (As a side note, when talking about Iberia I don’t refer to Tartessian because of it’s unclear classification, with the only possibilities being that it was either a Celtic language or, more likely, a form of Iberian).

When it came to substrates, I pointed out how the ancient DNA evidence had disproved a line of research that had become very popular and accepted: the Indo-European substrate throughout the Iberian peninsula, specially strong in areas (south, east and even the Basque Country itself) where non-Indo-European languages were spoken at the time of our first records. This theory was championed by the prominent linguist Francisco Villar, who was finding Indo-European substrates everywhere, but he was very adamant in pointing out that they were non-Celtic and non-Italic (obviously neither Indo-Slavic, just “unknown” IE). This was all a way to prove the Paleolithic Continuity Theory, and it had several followers who contributed to it. In that mentioned article, I looked at one study (in English) by Leonard A. Curchin where he goes through the substrate in Catalonia (an Iberian speaking region) where he finds that 50% of it comes from that non-Celtic, non-Italic IE branch (in contrast, he only finds 10% of the substrate to be Iberian). The confirmation that this theory cannot be correct has significant implications, since the reason why that 50% substrate was considered IE was none other than the fact that it was found in many other parts of Europe (where Iberian could have never been spoken, according to the tradition). This brings us to the next point, which is the large amount of non-IE words incorporated into the reconstructed PIE. This heavily “vasconised” (from Vasconic) reconstruction of PIE has also been found by some researcher based on statistical analysis (I can’t comment on the validity of the method used, but somehow the result seems to be correct, even if by chance):

The new surprise is that PIE, as usually reconstructed, appears to be a sister-language of Basque, in complete breakaway from Hittite. Amazingly, PIE would be as close to Basque as the North Caucasic languages are close to each other. This clearly shows that PIE, as usually reconstructed, must be seriously erroneous and contains plenty of substratic Paleo-European words, that drag the general picture away from Hittite and closer to Basque.A lexico-statistical comparison of Basque, Arnaud Fournet (draft, 2018).

A related phenomenon was found by Ranko Matasović when looking at the substrate in Balto-Slavic, noticing a common substrate in Northern and Western European IE languages not present in SE European ones:

“This paper presents an analysis of those words, attested in Balto-Slavic, that do not have a clear Indo-European etymology and that could have been borrowed from some substratum language. It is shown that Balto-Slavic shares most of those words with other Indo-European languages of Northern and Western Europe (especially with Germanic), while lexical parallels in languages of Southern Europe (Greek and Albanian) are much less numerous.” Ranko Matasović, Substratum words in Balto-Slavic,  2013.

When we look at modern Basque, we see that it’s absolutely full of Latin/Romance loanwords, which is expected given the last 2000 years of history, while it has very few Celtic ones (also expected, since their resistance to the Celtic expansion must have made them enemies and limited their contacts during the several centuries or neighbourhood), but there’s not trace of the old IE language that the CWH people would have spoken during the previous 2000 years to the arrival of Celtic.

Looking outside of Iberia we keep finding problems that can’t be explained if the CWH people had spoken an IE language. A non-Indo-European substrate in insular Celtic (usually considered either Afro-Asiatic -which now we know can’t be correct- or Vasconic) wouldn’t make any sense. As it wouldn’t make any sense for Germanic to be the least Indo-European of all the known IE branches at its core. There is a clear necessity for Northern and Western Europe to have a non-IE substrate, and and even more clear necessity to have a source for the non-IE languages attested. For a substrate, you need longstanding interaction between locals and migrants, with locals (usually the majority of the population) switching gradually to the language of the incoming people, first as a second language and eventually as the only one. This didn’t happen here, since interactions between locals and incoming people were from very short to non-existent depending on the place, and no local population switched to the language of the migrating one because, quite simply, no local populations survived.

In summary:

  • Northern and Western Europe experienced a population collapse at the end of the Neolithic (starting around 3000 BC and finishing around 2300 BC in some southern areas of Iberia).
  • Populations from the steppe (CWC and BBC, who were the same people) repopulated all of Northern and Western Europe. A 100% population turnover.
  • These populations from the steppe came from a small group initially, so they all had to share the same language.
  • That language had to be non-IE according to all the evidence we have.

However, since the good thing when it comes to both IE and whatever language was spoken by the CWH -I will refer to the latter, due to its geographical and temporal location as North Eurasian Bronze Age (NEBA) language family from now on- the areas covered are very large, we will go through the rest of them to confront what I’ve proposed here with the data we have from the rest of the areas.

UPDATE: This study published a year ago about the Neolithic in Denmark shows quite clearly what I explained above. They find a near total replacement first between the Mesolithic and the Neolithic, with low if any input from local hunter-gatherers. And 1000 years later, with the transition from the Funnelbeaker Culture (TRB) to the Sigle Grave Culture (SGC) again they find a near complete replacement:

“Insights from a few low-coverage genomes have indeed shown a link to the Steppe expansions, but by mapping out ancestry components in the 100 ancient genomes we now uncover the full impact of this event and demonstrate a second near-complete population turnover in Denmark within just 1,000 years. This genetic shift was evident from PCA and ADMIXTURE analyses, in which Danish individuals dating to the SGC and Late Neolithic and Bronze Age (LNBA) cluster with other European LNBA individuals and show large proportions of ancestry components associated with Yamnaya groups from the Steppe (Figs. 1 and 3 and Extended Data Fig. 1). We estimate around 60–85% of ancestry related to Steppe groups (Steppe_5000BP_4300BP), with the remainder contributed from individuals with farmer-related ancestry associated with Eastern European GAC (Poland_5000BP_4700BP; 10–23%) and to a lesser extent from local Neolithic Scandinavian farmers (Scandinavia_5600BP_4600BP; 3–18%)”

They also find that the overlap between both populations (TRB and SGC) was likely very short:

The age of the Gjerrild skeletons (from around 4,600 cal. bp) matches the earliest example of steppe-related ancestry in our current study, identified in a skeleton from a megalithic tomb at Næs (NEO792). We estimated around 85% of Steppe-related ancestry in this individual, the highest amount among all Danish LNBA individuals (Extended Data Fig. 6a). Notably, NEO792 is also contemporaneous with the two most recent individuals in our dataset showing Anatolian farmer-related ancestry without any steppe-related ancestry (NEO580, Klokkehøj and NEO943, Stenderup Hage) testifying to a short period of ancestry co-existence before the FBC disappeared—similar to the disappearance of the Mesolithic Ertebølle people of hunter-gatherer ancestry a thousand years earlier.

Allentoft, M.E., Sikora, M., Fischer, A. et al. 100 ancient genomes show repeated population turnovers in Neolithic Denmark. Nature 625, 329–337 (2024). https://doi.org/10.1038/s41586-023-06862-3

Italy

In contrast to Northern and Western Europe, Italy didn’t experienced a complete collapse of the Neolithic population. It’s likely that several areas got severely decimated or even completely depopulated, but Neolithic communities still persisted during and after the arrival of the people from the steppe. Therefor, the picture we have is quite different, with two populations of different origin inhabiting the area during the Bronze Age.

From a linguistic point of view this would mean that two language families may have been used along the Bronze Age, one from the EFF (unknown family) and the other one from the CWH (NEBA language). The picture we get by the Iron Age when we start to have evidence of the languages spoken in continental and peninsular Italy is analogous to what we see in Iberia: All the populations that didn’t switch to the recently arrived Celtic and Italic languages spoke a non-IE one. We don’t have any traces of an Indo-Slavic language or any other old form of IE that could be attributed to an arrival ca. 2500 BC.

Looking at the genetics, we have samples from Etruscan and Italic Speakers from Central Italy and they are both more or less identical and both largely descend from the CWH people (not 100% as in Northern and Western Europe, since in Italy they did admix further with the EEF that lived on along the Bronze Age). In other words, while no conclusive evidence can be learned from Italy alone, it’s all compatible with what we’ve seen in the previous section. To clarify, the Etruscan language itself could either come from the CWH (more likely) or from the EEF (less likely, but perfectly possible). This is ultimately a linguistic problem. (NOTE: As I was writing this, a new study with samples from Iron Age Picenes from Novilara and Pesaro -North Picene speakers, a poorly attested and controversial language- has been published. No surprises, as the samples resemble the above mentioned ones being largely of steppe origin). UPDATE: some models of Etruscans here and here)

As a side note, and for the sake of completeness, a short note about Sardinia. Modern Sardinians are outliers among the European populations in that they derive most of their ancestry from the EEF that colonised Europe from Anatolia during the Neolithic. However, ancient DNA does not show a complete continuity since the Neolithic. We have samples from the Bronze Age that have steppe origins. The contacts between Sardinia and the Mediterranean coasts of Iberia, France and Italy is then proved by these samples, though even without them it would still be reasonable to think that there were longstanding contacts between Sardinia and those other areas that were inhabited by CWH people. Therefor, it would be a mistake to assume that Paleo-Sardinian must be a language that came from EEF based on the modern DNA. It may well be from that source, but it may as well be a NEBA language borrowed from the neighbouring regions of mainland Europe. Once more, this is just a linguistic problem since DNA allows for both options to be possible.

South Eastern Europe

Unlike the rest of Europe, the Balkans didn’t see any migration from the CWH people. Instead, it was the sister branch, the Yamnaya people, who moved into the Balkans in the period from ca. 3200 BC to 2500 BC. As in Italy, the Balkans didn’t see a full collapse of the Neolithic populations, but probably the northern parts of it did see a significant decimation in the Neolithic people that facilitated the arrival of steppe populations. The southern parts (modern day Greece) remained fully populated by its Neolithic inhabitants along the 3rd millennium.

It’s hard to estimate accurately the impact of the steppe migrations in the Balkans due to not having enough samples so far, but in general we can say that it was significant but relatively modest compared to the rest of Europe. After 2500 BC, it’s likely that no new migrations occurred from the steppe, and the steppe people who were already in the Balkans must have started to mix with the local populations (more on this later).

From a linguistic point of view, what is remarkable at first sight is that we don’t have any surviving non-IE language in mainland SEE, even though it’s the area where languages could be attested earliest compared to the rest of Europe. And the better explanation for this is the fact that Indo-European speakers entered SEE at an earlier date, replacing the languages from both EEF and Yamnaya people before the Iron Age.

We are still missing the direct evidence from the critical samples, but we’ve had the indirect evidence for quite a while. Let’s look at the details.

Indo-European populations started to enter SEE Europe during the period from 2400-2000 BC. They came from West Asia (North West Anatolia was the immediate origin, but ultimately their origin had to be deeper into West Asia, around the South Caucasus) and settled the area of Thrace during this period. We don’t have the direct genetic evidence of this, since we simply lack any samples from this place and time, so I’ll quote from a relevant paper about the archaeological side of it:

“So, while the first half of the 3rd millennium BC in Thrace is characterised by a (comparatively) moderate level of social and economic complexity and the ideological dominance of pastoral tribes of a north-Pontic origin, there is a real explosion in complexity in the period between 2400 and 2000 BC and the region becomes increasingly included within a much wider network that is now dominated by frequent and highly visible exchange and trade, and new forms of prestige and status expression”

“The same conclusion of the existence of foreigners is also indicated by the use of many exotic and prestigious objects, often made of silver. This metal was not readily available in EBA Thrace. We can also note that tin-bronzes may have arrived into this region via Anatolia rather than Europe […] and it is difficult to imagine how such a quantity and quality, and the imaginations and customs behind these, can be transferred to Europe without having individuals or groups of people carrying them, and the infrastructure to organise their transport and wider distribution”

“There can be no doubt that the driving force behind this influx of goods and people is enhanced exchange and organised trade, and it is in no way an accident that concurrently the largest exchange network the world had seen up until then arrived at its peak. This network was centred in southern Mesopotamia, a region that had been fully urbanised for at least a millennium, and it stretched from as far away as western India on one side to southeast Europe on the other, and it also incorporated large parts of Central Asia”

Kanlıgeçit – Selimpaşa – Mikhalich and the Question of Anatolian Colonies in Early Bronze Age Southeast Europe, Heyd et al. 2016.

Now we’ll have to look at some genetic details from Greece in order to see how this may be reflected on the ancient DNA that we have available. As mentioned earlier, Greece didn’t see a population collapse in the period around 3000-2500 BC. There was a continuity since the early neolithic until after 2500 BC (just small amounts of ongoing genetic exchange with neighbouring regions, but nothing remarkable about it). The steppe population that moved through the Balkans during the EBA didn’t reach Greece during that period. It was once they settled and admixed with local populations from the Balkans when we first see an intrusion into Greek territory in the last part of the 3rd mill. To see the sequence of events, we’ll start by looking at 4 samples labelled as Greece_Perachora_BA (G31, G62, G65 and G76a) dated 2700-2200 BC:

To understand what this shows: In the columns there are sampled populations from different locations and periods. In this case the first two columns (after the initial one with the target samples from Greece mentioned above) represent samples from Bulgaria Chalcolithic (BGR_C) and from Greece Neolithic (GRC_Peloponnese_N), and they are supposed to represent the Neolithic/Chalcolithic population from the Balkans. The next tree columns represent West Asian populations (the Kura-Araxes Bronze Age culture from the South Caucasus with samples from what is today Armenia, then samples from the Levant Early Neolithic, if I remember correctly from what is today Israel, and finally samples from Central Anatolia Chalcolithic). The last column are samples from the Yamnaya culture from the steppe, from around 3000-2500 BC.

In the rows we have the four samples from Greece (Perachora, Bronze Age) mentioned above. And what we see is that they can be mostly modelled (97.2% average) with the first two columns representing local populations from the Balkans Neolithic/Chalcolithic. There’s only a 2.5% of West Asian admixture over whatever was already there in the Neolithic/Chalcolithic (which wasn’t much) and the 0.4% from the steppe is within the noise levels, so basically nothing at all.

However, during the period from 2300-1900 BC we have a few samples that are clearly different:

These samples derive two thirds of their ancestry from the Balkans Neolithic/Chalcolithic, and the other third from the steppe. We don’t know from where these samples may have come from, but probably from the Western Balkans there steppe admixture was higher.

However, this was not the last movement of populations into Greece. Here we have some groups of Mycenaean samples from 1600-1200 BC:

Here we see that Mycenaean Greeks have 20% ancestry from West Asia that was not present before their arrival, indicating a very significant change in the population somewhere between 1900 BC and 1600 BC. This Mycenaean type of ancestry is the one that persisted during the classical period, as we can see from these other two samples from the Greek colony in North East Iberia of Empuries, dating one to around 750-400 BC and the other one around 350-200 BC:

Note that the above samples were outliers among the ones from that colony, where the other were local Iberians that are very different as can be seen below:

Since we are missing samples from South East Europe from the period around 2400-2000 BC it’s difficult to pinpoint the exact origin of the Mycenaean people, but it had to be somewhere around Thrace or North West Anatolia. Once we get samples from that time and place, we’ll also be able to better asses their origin within West Asia. But since we know that the largest part of Anatolia was settled by speakers of the Anatolian branch of IE languages, it seems necessary that the origin was beyond Anatolia, with the South Caucasus being the most likely place.

A last note for completeness about Crete. There the West Asian admixture arrived earlier than in mainland Greece, and its likely source was South East Anatolia. This leaves us with two options about the affiliation of the Minoan language: it could either come from the local Neolithic inhabitants (EEF) which would basically make it an isolated language, or it could come from the Anatolian side and be an IE language of the Anatolian branch. There’s no evidence that it could be related to Greek itself. For a reference, here’s how they look:

And with this we’ll leave Europe for now (more later) and move on to Asia.

Asia

Anatolia

Anatolia was the origin of the Neolithic population of Europe, as mentioned. In the early neolithic, they had their characteristic genetic signature, but as time passed there was a significant mixing among West Asian populations that made all of them get admixture from the others. Since Anatolia is at the west end, that admixture was mainly from the east (South Caucasus/North Mesopotamia and beyond), and from the south (Levant). This makes it a bit more difficult to distinguish migrations between these areas, since we need enough resolution to see a significant change in a short period of time in a specific place to know that there was a migration and not just the ongoing general admixture that was happening all the time. The increase in admixture from the South Caucasus from the Neolithic to the Chalcolithic is evident and can perfectly justify the arrival of IE languages from the east (though let’s remind ourselves that a migration is not always necessary for the spread of a language, nor does a migration guarantee a language shift unless it’s a complete replacement as seen in Europe). We’d just need a higher resolution to find the specifics that might have brought the IE language from the Caucasus to Anatolia in the period around 4000-3500 BC.

Above, two Neolithic populations from around Central Anatolia. Below two Late Chalcolithic ones:

The shift to the “east” (more admixture from South Caucasus, less from Western Anatolia) is very clear, but this is very general and we’d need more detailed data to pinpoint a putative IE arrival.

In any case, the last publication (The genetic origin of the Indo-Europeans) from one of the main teams doing this research already went with the hypothesis that the IE languages arrived to Anatolia from the South Caucasus, which should be correct, so I don’t think I should extend any further about this point.

South Caucasus

Here is where my views diverge from the above mentioned study. The reasons should be obvious already, since in that paper they argue that PIE (what they call Indo-Anatolian) originated in the North Caucasus/Lower Volga area, and from there it crossed to the South Caucasus from where it went to Anatolia. They need this scenario because they still argue that the steppe populations (Yamnaya and CWH) were the ones that spread the rest of the IE languages (all except the Anatolian branch), while I’ve been arguing so far that those steppe populations spread non-IE languages that I’ve referred to as NEBA languages. Apart from the fact that the European linguistic reality requires a non-IE substrate, not to mention a source for the known non-IE languages, the probability of the Chalcolithic societies from the South Caucasus to have adopted the language of the incipient pastoralists of the steppe Eneolithic is not very plausible. It would have been much more likely to go the other way, but for what we know it didn’t, and the steppe pastoralists kept their original language (at least at this stage – more on this later).

My preferred view about the arrival of IE languages to the South Caucasus is that they did so from the east. Reading several papers about the archaeology of the South Caucasus some years back, there was a clear suggestion that new people started to arrive there around 4200 BC, and these people were the ones who later formed the Kura-Araxes Culture (which is more commonly dated to start around 3700 BC – this probably because this was a migration that was slow and lasted a few centuries). The origin was unknown. However, we’ve been lucky to get some of those early samples from around 4200-4000 BC from Armenia (Areni Cave) and they are in fact considered as part of the Kura-Araxes Culture despite their early dates. Coincidentally, it’s those same samples that the latest study mentioned above choose as the earliest IE speakers in the South Caucasus, arguing that they came from North of the Caucasus since they have steppe admixture. However, those samples also have admixture from the east (though I’d also say those samples are quite strange in their genetic profile and difficult to analyse), and crucially they happen to carry a strange male lineage (the Y chromosome haplogroup L1a) which is quite rare, but clearly came from much further east and not from the steppe. Later samples are more clear in their autosomal profile, so as an illustration here are the oldest 3 samples (other than those from the 5th mill. from the Areni Cave) from the Kura Araxes Culture, dated to the late 4th mill. (3350-3000 BC) as well as the 3 oldest from the Maykop Culture from the North West Caucasus, also from the 4th mill. (3375-3500 BC):

As seen the largest part is still local, but there are some significant contributions from the north (represented by some samples from the steppe north of the Caucasus mountains from around 4200 BC) and from the east (represented by some samples from Turkmenistan -Geoksiur- Neolithic).

I said above that an arrival (of IE languages to the Caucasus) from the east would be my preferred scenario because I don’t consider it completely necessary. The alternative would be that the South Caucasus was already part of the pre-IE speaking area since the Neolithic, but that would make for a larger PIE homeland which is less parsimonious from a linguistic point of view.

The second matter I want to examine from this area is a hypothesis that if correct it would be important, not so much for the IE languages (though it would help convince some sceptics), but mostly for the NEBA languages. It’s the origin of the Hurrians.

Hurrians from the steppe?

I’ll start by looking at some linguistic considerations that first brought my attention to this topic. For a long time, linguists have tried to find the origin of the Basque language, or at least to find some other language related to it. The most recurring suggestions have always liked it to the Caucasus languages, and more specifically to the North East Caucasus ones. This, of course, was a very controversial hypothesis, given the distance between the Basque Country and the Caucasus, together with the lack of any plausible connection from a cultural or population level. As an example of this hypothesis, here’s a quote from one of its more recent and prominent proponents, John D. Bengtson, from his book “Basque and its closest relatives: A new paradigm“:

“In direct contradiction of these kinds of statements [the uniqueness of Basque], the thesis of this book is that Basque is demonstrably related to other languages, i.e., that a scientific analysis of the evidence leads to the most probable conclusion that Basque is, at first remove, most closely related to the North Caucasian language family.

However, with all the data that we now have, a connection between the Basque Country and the North Caucasus has become much easier to explain, given that Basques, just as all the rest of Western and North Europeans came from the steppe and that the North Caucasus is just bordering the steppe from which they came from. Everything indicates too that Basque is indeed a relict from the languages spoken by the CWH people who settled most of Europe around 3000-2500 BC, and North Caucasians (and specially NE Caucasians) are the modern population that’s genetically closest to the original steppe people (like the Yamnaya people), while the Caucasus mountains are an area where their language could have survived more easily once the IE languages replaced it throughout the steppe.

The next link in this chain is the fact that those looking for the origin of North (especially NE) Caucasian languages have found Hurrian and Urartian as the most likely ancestors. While I can’t asses any of this from a linguistic point of view, I’d like to look at the genetic evidence that we have and could help solve these questions.

We know more or less (indirectly) that people from the steppe started to cross the Caucasus around the second half of the 3rd mill. during the late Yamnaya period or early Catacomb one (the Catacomb Culture people were a continuation of the Yamnaya people). We more or less know that horses were domesticated around the middle of the 3rd mill. in the steppe, somewhere between the Caspian and the Black Sea (link). And these horses must have started to be traded across the Caucasus shortly after (the earliest sample of a domestic horse of the modern type that we have comes from Anatolia ca. 2100 BC). Whether the domestication of the horse and its trade was the reason why people from the steppe started to venture into West Asia is unclear, but it probably helped that the trade was established.

The oldest references to Hurrians that we have date to around that period (they were established in North Mesopotamia around 2250 BC). Their strong connection to horses is well known:

It seems that one of the first important results of the Mozan/Urkesh excavations, at least from the point of view of Indo-European studies, was the discovery of a beautiful sculptural image of a horse head dating from the middle of the third millenium B.C. From much later representations of horses, possibly continuing the same Hurro-Urartian tradition, one may particularly compare a bronze horse head from Karmir-Blur (VIII c. B.C.). Subsequent findings in Mozan/Urkesh have shown a number of horse figurines coming from the storeroom of Tupkish’s palace (about 2200 B.C.), some of which represent the domesticated animal. These numerous figurines, which belong to the following period of the history of Urkesh in the last quarter of the III mil. B.C., make it clear that the horse was extremely important in the life of the society. Particularly interesting seem horse figurines showing the harness, thus documenting the use of horses in transportation.Horse Symbols and the Name of the Horse in Hurrian, Vyacheslav V. Ivanov, 1998.

From the point of view of ancient DNA, we have some interesting clues so far. The first one comes from a site in the Levant, Tel Megiddo, in modern day Israel. During the mid 2dn mill. this area is said to have had a significant Hurrian population, and apparently Tel Megiddo itself had a king with a Hurrian name. We have many samples from this site, and all of them are of local origin except 3 outliers (two of them are brother and sister, so grouped as one, and dated to 1600-1500 BC, while the third one is dated to 1688-1535 cal BCE). This is how the local samples from the same period look like:

And this is how the outliers look like:

Clearly, these outliers had steppe origins, with the brother (the only male) probably having the typical paternal lineage of the Yamnaya people (but due to low resolution in the Y chromosome we don’t know for sure since it’s just labelled as R without the subclade). Of course, we don’t know if these outliers were Hurrians or not, but given the historical knowledge it seems more likely that they were indeed Hurrians rather than some random travellers.

The second clue comes from later Hurrian and Urartian samples, which are already from ca. 1000 BC and later and their steppe ancestry has greatly diluted, but the males remain having largely the Yamnaya paternal lineage.

None of these clues alone can tell us if Hurrians came from the steppe, but together they do make for a compelling case. Ultimately, we’ll need to wait for samples from early Hurrians (pre-2000 BC ideally) to know with certainty. However, things may become a bit more complicated when we take a look again at a possible role of the Yamnaya population from the steppe when we get back to Europe.

Central Asia and North India

Finally we get to the last area that is relevant for the IE question. When it comes to Central Asia, we have to divide it into North (mostly Kazakhstan), which was part of the steppe and was settled by the CWH people around 2000-1400 BC with the Andronovo Culture, South (Turkmenistan, Uzbekistan and Tajikistan, which we will refer to as Turan, following the literature published about it), which had a local population dating back to the early neolithic period and the eastern edge (Tajikistan, Kyrgyzstan and SE Kazakhstan and till the Altai Mountains) that we will refer as the Inner Asia Mountain Corridor (IAMC) which has its own distinct population from the Paleo-Mesolithic period.

The period between 2000-1500 BC is the critical one when it comes to asses the linguistic side of things since during that period we have the different populations from Central Asia, plus the population of North India, plus a population that reached the Near East (the Mitanni), all speaking the same language: an early form of Indo-Iranian that was close to Sanskrit (Sanskrit itself being the form spoken in North India at the time. For example, about the dating of the Rig Veda, David Anthony’s “The Horse, the Wheel and Language” (2007) states: “The oldest texts in Old Indic are the “family books,” books 2 through 7, of the Rig Veda (RV). These hymns and prayers were compiled into “books” or mandalas about 1500-1300 BCE, but many had been composed earlier.”). This means that since the population from the steppe had just arrived to the area from the west, either they switched to the language spoken in those other places during the 2000-1500 BC period, or that they managed to spread their own language to all of those places during that same period of time. The most accepted traditional view has been that the latter is what happened. Here instead, we will explain that the former is the scenario that is compatible with all the data that we have.

With regards to genetics, it’s relatively simple. What we see is that during that period of 2000-1500 BC there is a low level admixture in both populations of north (steppe) and south (Turan) from each other. This was largely mediated via females, since the male lineages largely remain unchanged in both of them. Basically, there’s really not much in the genetics that would suggest a language shift from any of them, though there is enough to see that they were in contact and therefor a language transfer is compatible with the data. But this had to be more due to the cultural exchange than to actual migrations. Here are the samples we have from Turan from that period (minus two outliers from Bustan looked to come from the South Caucasus). The earliest we have from after 2000 BC is dated to 1650 BC and they go down to 1250 BC:

Meanwhile, the steppe populations during that same period were much more diverse (it’s a much larger area too), with some complex admixture in many individuals, while others stayed much more unadmixed as seen in the two figures below:

The archaeology in which the traditional view of Indo-Iranians being originally from the steppe is based is now mostly outdated. For example, Elena Kuzmina considered that “The Andronovo provenance of the fire-cult and the cremation rite is beyond dispute” (The Origin of Indo-Iranians, 2007). And goes on to remark the importance of it for the spread of Indo-Iranian from the steppes to the south:

Northern Bactria provides a unique opportunity to trace the southward migrational process of the Andronovo population and its assimilation with the locals. Since the material culture of the aborigines was highly developed and adapted to the ecological environment, the newcomers adopted in its entirety the complex of their material culture, while retaining their ethnical distinction in the most important sphere—ideology: in the cults and burial rite. As is well known, the principle condition for maintaining ideology in traditional culture is the preservation of the language which conveys mythological concepts and ritual texts. […] Since in the assimilation process in northern Bactria it was the ideological concepts of the Andronovans that took the upper hand, it means that their language conveying ideology and ritual activity became the winner too.

However, since then, it has been found that the cremation and fire cult have clear antecedents in the population from the IAMC, at sites like Begash and Tasbas, As David Anthony has already pointed out:

“The pre-Andronovo mortuary custom of cremation documented at Tasbas and Begash continued into the Andronovo period as a distinctive trait of Fedorovo mortuary rituals in the Tien Shan region but with the addition of a kurgan, stone fences, and other Andronovo traits absent from the Begash Ia and Tasbas level 1 mortuary customs.Samara Valley Project and evolution of pastoral economies in Eurasian steppe (2016).

A recent paper with new dates from the Tulkhar necropolis (Bishkent Culture, now dated to 2800-2400 BC) confirms the same:

“The new materials and the new calibrated radiocarbon dates significantly amend the understanding of many processes that took place during the Bronze Age both in Central Asia and far outside of it. Materials of the Early Tulkhar Necropolis (South Tajikistan) are often used to prove active contacts between the steppe livestock-farming Andornovo people and the settled crop-farming Central Asia people. Andronovo influences in the first place are found in the cremated burials of this necropolis. E.E. Kuzmina considers these burials archaeological evidence of her hypothesis about the Andronovo people (Indo-Aryans) migrating across Central Asia (Bishkent culture) to the North-West Pakistan (Swat culture) and North India. The new materials and the new calibrated radiocarbon dates recently appeared. They prevent relating the Andronovo people and the cremated burials in the Early Tulkhar Necropolis. The South Urals Fedorovo culture stands out with cremated burials and dates back to 1742–1451 calBC according to the latest data. The Tulkhar cremated burials appeared a lot earlier, namely no later than in the early 3rd millennium BC.” Materials of the Early Tulkhar necropolis in the light of the hypothesis of Andronovo population migration to the south: problems of chronologySevetlana V. Sotnikova, 2024.

For reference, here’s what Elena E. Kuz’mina (The Origin of the Indo-Iranians, 2007) thought about the Bishkent Culture:

The origin of the culture is open to debate. B. Litvinsky and L. P’yankova believe that the culture is genetically related to the BMAC and reflects a change-over of a part of the farming population to pastoral stockkeeping. A. Mandel’shtam and E. Kuz’mina, on the other hand, hold that it was created by Andronovo pastoralists and, possibly, representatives of the Zaman-Baba culture. They came to use the ceramics of the neighboring farmers and also began making hand-made pottery, which imitated in shape that produced on the potter’s wheel.

Of decisive importance is the evidence concerning the burial rite. The early monuments of the Bishkent culture maintain the characteristic features of the Andronovo Fedorovo burial tradition: burial mound, enclosure, stone cist, cremation, swastika, and the hand-made ceramics. Later there appeared graves with a downward passage and catacombs. The origin of this rite in Central Asia remains debatable. It is known both in the Bactria-Margiana culture, but its genesis there is unclear, and in the Zaman-Baba culture where it may be a heritage of the Catacomb culture of the European steppe. In types II and III of the Bishkent burials Andronovo features are preserved: burial mounds and stone enclosures, small cists, the position of the deceased and the custom of double-burial, the round and rectangular shape of the sacrificial hearths, the vivid manifestations of the fire cult. As long as the burial rite is an ethnic indicator of a culture, which is upheld even during long-distance migrations to another ecological niche, and wheel-made ceramics are quickly borrowed by new-comers, there are serious grounds to believe that the creators of the Bishkent culture were by origin Andronovo pastoralists, who came into contact with representatives of the BMAC, which is also expressly indicated in the farming culture of Tadzhikistan and Uzbekistan.”

The Bishkent related Vakhsh Culture also predates Andronovo:

Recent discoveries and radiocarbon dates provide good evidence to consider anew the Vakhsh culture of southern Tajikistan. This “culture” is almost exclusively identified by its burials under kurgans (“classical Vakhsh culture”) except for one settlement, and by its handmade pottery. A detailed classification of the pottery coupled with the available dates or comparisons is presented here. It can now safely be dated between the second half of the 3rd millennium and the 17th century BC as shown by radiocarbon dates and is thus contemporary with the Bactria-Margiana Archaeological Complex (BMAC). A few Vakhsh pots have been found in southern Bactria up to Herat and parallels can also be found in graves from Gonur Depe. It has no connection with the Andronovo culture but presents affinities with communities of the Altai-Xinjiang area.The “classical Vakhsh culture”, Mike Teufer, 2020.

Again E. E. Kuz’mina on the Bishkent-Vakhsh Culture (which she considered together):

“A. M. Mandel’shtam (1968: 131-141) conducted a systematic analysis of the funeral practice of the Bishkent (Vakhsh) culture and demonstrated specific correspondences with Indo-Aryan practices. He viewed the Bishkent culture as cattle raising, coming from the north-west in transit to India, and he noted its similarity to the Andronovo culture. B. A. Litvinsky (1964: 158; 1967: 122-126) connected this culture with the Nuristani languages and showed its analogies in Swat. E. E.Kuz’mina (1972 a: 134-143; 1972b: 116-121; 1974: 188-193; 1975: 64-7) emphasized the Indo-Iranian attribution of the culture, its connection with Swat and Gomal and the participation of the Zamanbaba and Andronovo components in its formation.”

Moreover, the archaeology that Kuzmina cites for the expansion of the Indo-Iranians to the south (from the steppe) is dated to very late layers of the sites she mentions, like Bustan or Dzharkutan, where steppe finds are in the layers from around 1000 BC which is 1000 years too late for the spread of Indo-Iranian (the samples we have from those sites that date to the period from 1650-1250 BC are local people, with the slight steppe admixture as seen above). She also refers to the light skin and eyes of some modern) populations of North India/Pakistan as a proof of the steppe origin of their language, which is something irrelevant for many reasons that I won’t extend here about.

Basically no evidence at all for the sort of huge events that should have happened in order for the Indo-Iranian languages to spread from the steppe to such a big area in such a short period of time. Nor any evidence that the people from the steppe could have spoken an IE language in the first place (quite the contrary, as already seen from other areas). Instead, we have a much easier explanation for the steppe populations to have acquired the Indo-Iranian language from their southern neighbours, along with much of the culture, technology, rituals and economy (for the change in the economy of the steppe population before and after the contact with the populations of Turan and IAMC, a graph (figure 16.12 here) from David Anthony’s “The Horse, the Wheel and Language” (2007) is quite revealing, showing the change of diet from an animal based to a mixed one.

When it comes to India, unfortunately the ancient DNA record is almost completely missing. Very few samples (to my knowledge) have been analysed so far and none of them published. But the DNA we have from the surrounding areas already tells us with high confidence how the early Vedic people should look like: Basically just like their predecessors from the Indus Valley Civilization. We don’t have direct samples from the latter either (except one of very low quality that was published years ago), but we have outliers from the surroundings that clearly had an Indian origin (known as Indus Periphery samples). The ones from the Indus Valley itself should look similar but with a significantly higher proportion of the specific Indian signature, usually referred to as Ancient South Indian (ASI or AASI). And indeed, the unpublished samples from the core Vedic area dating to the mid 2nd mill. (late Rigvedic period) are, as far as I know, exactly like that. But we still have to wait for samples to be published in order to be certain about it.

Some of the genetic remarks in the literature that suggest that Indic speakers came from the steppe are based on modern DNA, and as in the case of Kuzmina’s mention to the light skin and eyes of modern Dardic and Nuristani people I won’t comment of the details of why they are irrelevant. Overall, the hypothesis of Indo-Iranian languages reaching India from the steppe is simply not possible with the current data available. If some surprising evidence emerges at some point we could revisit the subject, but for now there’s not much more to say about it.

Now let’s briefly mention the Mitanni people that moved to the Near East in the 2nd mill BC. They have been usually considered an Indo-Aryan population (rather than Iranian), but that’s just because at the time they started to move to the west (likely around 1900 BC or slightly later), Proto-Indo-Iranian (PII) was just starting to break up and all the dialects from that time are similar to Sanskrit. The Mitanni Kingdom itself is first mentioned around 1550 BC, but the people must have started to arrive (from Turan) quite a bit earlier. We lack Mitanni samples so far, and the closest we have is an outlier from the site of Alalakh, in the Levant, dating to ca. 1550 BC which has a clear origin in Turan. But of course, we don’t know if it’s a Mitanni sample or not. However, given the origin of the Mitanni and their language, they should all look the same to that sample, i.e, like all other samples from Turan (though as time passes, with local admixture, obviously, like the one shown by the later Iron Age samples from Ascalon in the Southern Levant included below too, dating to around 1200-1100 BC). Once more, we’ll have to wait for more relevant samples to confirm this.

With the above said, the question still remains as to where was the origin of Indo-Iranian. And in my opinion the only way to explain the successful spread of the language is consider that Proto-Indo-Iranian became a prestige language and eventually a lingua franca during the mature period of the IVC and BMAC which would be around 2500-2000 BC. There seems to be no other way that can easily explain the fact that this language was spoken in both places at the same time. We do know that these two civilizations had intensive contacts, so it seems reasonable to think that during the peak of their development and trade, they established a common language that became the language of all the people in those areas, as well as those in contact with them. Whether the original pre-Indo-Iranian was spoken in one place or the other is something that would be quite more difficult to asses, so I won’t get into it. After the collapse of these two civilizations, the language must have started to break up, but we still know that during the period immediately after 2000-1500 BC they all must have been quite similar (Sanskrit in North India, Mitanni in the Near East and the language of the early Scythians on the steppe).

The big picture

The devil is in the details, they say, so we’ve first gone through the most important ones of each area. Now it’s time to step back and look at the big picture:

Approximate extension of steppe populations and Indo-European languages c. 2000 BC

Notice that the above map is not intended to be accurate in the details, but just to give a broad approximation. For the steppe populations, the dotted areas represent where they were alongside local populations, while the solid area is where they were the only population living in that huge area and speaking their native (NEBA) language.

The PIE homeland

From all of what we’ve commented so far, as well as from the map above, it may be clear to anyone that’s gotten this far that the PIE homeland must be placed in North Iran and Turan. The two main factors that make it necessary to place it there are the presence of IE languages in India and the Tocharian language in Xinjiang (China). From further west, those two things would be too difficult to explain.

The origin of the language must have been in the South Caspian area, from where it went with the Neolithic to Turan. These areas must have spoken pre-IE from the early Neolithic. PIE would be the phase prior to it’s expansion outside of that homeland, which would be close to 4500 BC. We lack ancient DNA from India to know the date or arrival of the population that formed the North Indian one, but certain anthropological studies suggest that there was a change around 4500 BC. And from the samples that we have from later dates, we know that North Indians can be modelled as a mixture of populations from Turan and ASI. What we don’t really know is if this possible migration to India meant a split in the PIE language or it stayed as a language continuum due to the continuous contacts. Regardless of the level of divergence that may have existed, it was later erased when Proto-Indo-Iranian because the common language in the 3rd mill.

The first know split, then, should be the one that lead to the Anatolian branch, which as mentioned before must have happened when people from North Iran moved to the South Caucasus ca. 4200 BC. Though the divergence didn’t happen in the South Caucasus, where it stayed close to the core area, but rather when the language went from the South Caucasus to Anatolia somewhere around 4000-3500 BC. It must have been in the southern parts of Anatolia where the language stayed more isolated from the rest and diverged from the other branches.

The next split had to be the one that lead to Tocharian, and for this we’ll have to look a bit closer at the IAMC.

The Inner Asia Mountain Corridor

This corridor at the eastern edge of Central Asia had a native population that was genetically what has been called Ancient North Eurasian (ANE). This genetic profile was also found throughout Siberia in the Paleolithic, and forms part of the Native American populations (admixed with East Asian). In its pure form, it survived from the South Urals to the Altai and through this IAMC well into the Holocene. We have a Mesolithic sample from the site of Tutkaul (Tajikistan) dated to around 6200 BC, a time corresponding with the Hissar Culture which probably started to have contacts with the Neolithic neighbouring regions and eventually led to this population of the IAMC to adopt pastoralism during the 6th Mill (link). We have evidence (though indirect) that this population was moving between Central Asia and China, since they’ve been found to have seeds that originate from both places (see, for example, Frachetti et al. 2014). Some indirect evidence comes too from faunal remains in Inner Mongolia (China), where domestic sheep of the West Asian type has been found and was probably there since the mid 5th mill. (link). In the Altai, we have the earliest evidence from seeds too dating to the end of the 4th mill (link) though it’s probable that they were there since earlier.

What the evidence suggests is that this population adopted an IE language from their southern neighbours (from Turan) at an early date, probably before 4000 BC. It may have been around 3500-3000 BC when part of this population settled in a more permanent way in Xinjiang, what led to the partial isolation of their language which would evolve into Tocharian (while those who stayed along the IAMC would have continued to evolve their language in conjunction with that of Turan, becoming speakers of Proto-Indo-Iranian when it became the language of BMAC). From a genetic point of view, we can look at a few samples that would support this idea.

A sample from the site of Dali (Kazakhstan), part of the IAMC, dated to 2700 BC already shows some admixture from both the southern neighbours of Turan and from the steppe population that arrived to the Altai region ca. 3000 BC, Afanasievo, which shows how these people were moving along that IAMC from the north to the south. However, samples from a later date (c. 2000-1800 BC) from the Tarim Basin in Xinjiang, shows them to be unadmixed, suggesting a larger degree of isolation from before the date of the Dali sample:

Archaeological and genetic evidence provide already good evidence on which to base the idea that Tocharian must have come from this population (the idea that Tocharian may have come from the Afanasievo people lacks both types of evidence, for example) while it also avoids the linguistic problems that were always found in the alternative Afanasievo hypothesis.

The importance of this population when it comes to the spread of IE languages doesn’t end there, since as we’ve seen before they may have been the first ones responsible for introducing the Indo-Iranian language to the steppe people around the eastern Altai region, where their influence is visible in the Fedorovo burial rites.

Back to Europe

We’ve seen so far a probable way in which IE languages must have reached South East Europe, but now we’ll have a look at how they spread to the rest of Europe. The details are still fuzzy and it’s not too important for the purpose of this post. While the Steppe Hypothesis required a more detailed explanation given that not much else other than the languages would have come from the steppe (obviously the discovery that the people of most of Europe came from the steppe gave the theory a perfect basis. It’s just that the time and places of their expansions does not match with those of the IE languages, and that’s now its main problem), when it comes to something going from West Asia to SEE and then spreading to the rest of Europe there’s nothing controversial about it. Basically, everything came from West Asia to SEE Europe and then spread throughout the continent, whether it was farming, any innovations like metal working in its different varieties, writing systems, coinage, civilizations themselves or even Christianity. That’s just the natural way things went in ancient Europe.

We first have to look at the possibilities of how did IE languages spread throughout the Balkans. We’ve provided a credible scenario for Greek, but is that scenario valid for the rest of the Balkans? Let’s look at this problematic question (indeed, the most problematic one). The first solution would be that it was the cultural package from Thrace and Greece was largely responsible for spreading the IE language throughout the Balkans., since outside Bulgaria and Greece (maybe Romania to some extent), there doesn’t seem to be any West Asian ancestry between 2000-1500 BC which would be the time when we’d need IE languages to have spread throughout the Balkans. In any case let’s take the chance to have a closer look at the population dynamics that took place in the Balkans during the Bronze Age which will also show the big difference with Northern and Eastern Europe. We have just enough samples from Bulgaria to show this process:

People from the steppe (Yamnaya Culture) started to move to the Balkans c. 3200 BC and this is how steppe communities from Bulgaria c. 3000-2800 BC looked like:

And here are contemporary local communities from that early period:

As can be seen, there’s a stark difference between them, with the steppe communities having very little admixture from locals, and local communities having very little admixture from the steppe ones (with two outliers at the bottom, from a first and second generation admixture event presumably).

After a few centuries, this is how a steppe community would look like (c. 2400 BC):

And how a local community looked like after a few centuries too (2800-2500 BC):

The steppe community had only one third of its ancestry left, while the local community had some 10% admixture from the steppe. By the time the communities finished admixing this is how they looked like (samples from Early Iron Age, c. 1000-500 BC since we lack from the Late Bronze Age, but they should be about the same):

Once the communities from both sides fused, their paternal and maternal lineages should more or less correspond with the amount of admixture contributed by each community. For example, in the samples above, there are 6 males: 5 of them have local paternal lineages and 1 has a steppe paternal lineage.

In the Western Balkans it seems like steppe communities represented a higher percentage of the population, since we see from 2000 BC and later some 30% steppe in the mixed communities (with paternal and maternal lineages from the steppe also being at around that level). Here are some Late Bronze Age (c. 1200-1100 BC) from Montenegro:

Clearly more steppe admixture and no West Asian admixture.

To reiterate what was said in the first part of this post, the sort of evidence shown here is the one that we lack from Northern and Western Europe. Not because we lack samples (we actually have a lot more) but because at the time the steppe communities started to arrive, the Neolithic ones were mostly gone, and where they still lived it was just long enough for the steppe communities to take females from them before they died out. So not only we lack direct evidence of any of those few communities that survived until the arrival of the steppe, we also can’t show how a fused community between steppe and local would look like after several centuries because such thing never happened. There were no mixed communities. The only ones that existed were the ones from the steppe, with 100% of the paternal lineages being from the steppe.

Back to the problem about the Balkans. We need the Indo-European languages to be all over the Balkans by 1500 BC or shortly after, but we don’t have any clear evidence of how this may have happened. Genetics don’t give us any solution, so only archaeology can help here. The spread of IE languages had to have been mostly a cultural transmission, but I will leave this for people with more expertise in the archaeology of the Balkans and meanwhile offer a possible alternative to this cultural transmission.

We could speculate that the Yamnaya people had already shifted to an IE language from the Caucasus in the period from 3500-3300 BC (i.e, after the CWH people had separated, and probably Afanasievo too). A language transfer across the Caucasus is much more possible during this Maykop-Novosvobodnaya phase than anything related to the preceding Darkveti-Meshoko period. And the language transfer would go the natural way, from the more settled, higher culture society to the more mobile, pastoralist one. In this scenario, Yamnaya would have spread the ancestor of Italo-Celtic, Germanic and pre-Balto-Slavic to the Balkans before being replaced in the steppe by the Srubnaya Culture (c. 1800 BC) which would have brought a non-IE language again until the arrival of the Scythians. However, while having evidence of actual people from the steppe moving around the Balkans seems better than no evidence from West Asian admixed populations, it’s still true that they were the minority, lived separated for quite a while from the locals and didn’t have a superior culture that would attract the locals to it. Rather the contrary. So it’s up to each reader to decide if this scenario does really improve things over the first one where cultural transmission would be the basic reason for the adoption of IE languages. Lastly, this alternative scenario, would be incompatible with Hurrians being from the steppe, so if the latter is confirmed it would invalidate this possibility.

From the Balkans to the rest of Europe it won’t be of much help to get ancient DNA because people within Europe were already very similar to each other, and it’s difficult to detect movements of people from genetics unless we have a very high resolution. From what we know about Celtic or Italic, we shouldn’t expect a large amount of people to have been migrating with the languages as they expanded (and very small genetic impact). Following the spread of innovations like iron or war chariots puled by domestic horses may be a better way to track the spread of IE languages throughout the rest of Europe (war chariots may have already played a role in their spread through the Balkans, at least Greece). I will be very brief about this, since the details of it are beyond the scope of this post.

For the Balto-Slavic languages we have some constraints that allow us to know the approximate place and time where they formed, since their formation was strongly influenced by Indo-Iranian language. We know that Indo-Iranian started to be adopted on the steppe at its eastern edge around the Altai region of South Siberia shortly after 2000 BC. These early adopters could be considered Proto-Scythians, and the genesis of the Scythian culture throughout Central Asia would continue till around 1200 BC. At that point, they started to move to the west through the steppe, replacing the preceding Srubnaya Culture (which like the Andronovo Culture was a descendant of the Sintashta Culture, but didn’t go through the language shift that happened in the Andronovo one and therefor would have still spoken its original NEBA language). The Scythians may have arrived to the western edge of the steppe around 1200-1000 BC, which would be the earliest date for starting contacts with IE speakers from the adjacent area in Europe.

The population that would become Proto-Balto-Slavic must have already spoken an IE language by then, but in a older centum form. This would mean that IE had already spread by then to the north of the Balkans, as was said earlier to be required. A good candidate given its time and location for being the culture where Proto-Balto-Slavic formed would be the Chernoles Culture, that started at the end of the 2dn mill. and continued until 500-200 BC. If these were Herodotus’ Scythian ploughmen, as speculated (no reference there by who or why), it would align very well with this possibility, since we should be looking for a population native to Europe and being sedentary farmers, not nomads, but who shared several cultural traits with the Scythians, which would easily explain the influence in their language too.

Notice that dating Proto-Balto-Slavic to around 1000-500 BC and to that approximate area is something necessary due to the clear Indo-Iranian influence that cannot be explained in any other way. After that formation period, we’d have the Baltic branch separating around the latter stage and expanding to the north. The details of this are something I’ve not tried to figure out and it’s not relevant for the purpose of this post. The important thing is that the case of Balto-Slavic formation that can be located and dated with significant accuracy should serve and an example of how IE languages formed and spread to the rest of Europe from the Balkans. Baltic languages (considered to be a very old form of IE) dating to around 500 BC when they started to expand to the north should also help to put into perspective the age of IE languages in Europe.

The details of other language branches should be somehow analogous. Italic and Celtic proto languages (whether one prefers to consider Italo-Celtic a proto language or just some areal features that defined an Italo-Celtic sprachbund area) would have formed in the North Western parts of the Balkans and adjacent areas, with Italic then separating and moving into the Italic peninsula while Celtic expanded to the west from around the eastern part of the Alps, mostly as always proposed.

Germanic is the least clear one, but it should have been a similar process. If we consider that the Chernoles Culture was roughly preceded in the area by the eastern part of the Trzciniec Culture, and that preceding Trzciniec culture had already become IE (somewhere around 1800-1500 BC), then the western part of it would have become IE too and would already be in the right place and time to be ancestral to Proto-Germanic. I’m not specifically proposing that scenario, it’s just for the sake of giving an example.

UPDATE: Checking the available samples, I noticed we have a good sequence from Czechia from the Bronze Age to the Iron Age. Looking at them, I see a significant change between 1600 BC and 1500 BC approx. coinciding with the end of the Únětice Culture and the beginning of the Tumulus Culture:

The samples from the Tumulus culture are dated to 1500-1250 BC, without C14 dating and there are only 4 of them. But the change is persistent through the Late Bronze Age (samples from Knoviz, c. 1100 BC, not shown) and into the Early Iron Age samples from Hallstatt period below:

The Bulgaria EBA samples are a relatively distant source, so the significant 27.5% impact is underestimated. With a more proximate source (like samples from Mokrin, in the Serbian border with Hungary, dating to c. 2000 BC) the impact is around 40%. Quite a big change in a small period between two consecutive cultures.

I leave here some references to some interesting papers related to the formation of the Nordic Bronze Age (c. 1600 BC) and its connections to the Carpathian Basin and ultimately the Aegean world (thanks to Jaydeepsihn Rathod for pointing out this in the comments):

  •  Issues with the steppe hypothesis: An archaeological perspective Iconography, mythology and language in Neolithic and Early Bronze Age southern Scandinavia. Rune Iversen, 2024.
  • It is therefore not surprising that Europe and the Aegean during the 15th–14th centuries bc shared the use of similar efficient warrior swords of the flange-hilted type, as well as select elements of shared lifestyle, such as campstools. Linked to this are also tools for body care, such as razors and tweezers. This whole Mycenaean package, including spiral decoration, was most directly adopted in South Scandinavia after 1500 bc, creating a specific and selective Nordic variety of Mycenaean high culture that was not adopted in the intermediate region (Kaul Reference Kaul2013). This could hardly have come about without intense communication and practice by travelling warriors or mercenaries. Swords come in different types and have different fighting styles (Reference KristiansenKristiansen Reference Kristiansen2002; Molloy Reference Molloy2010). Therefore they are not easily adapted: they are part of a system of warfare and skills that demand long-term training. Furthermore they demand changes in social organisation in order to sustain the new role of warriors. It therefore seems likely that warriors were at the same time also traders, or they accompanied traders to protect them. We may therefore accept that the shared use of sword types among Scandinavia, Central Europe, and the Aegean during this period would also lead to similarities in the social institutions linked to warriors. This seems indeed to be the case: the dual organisation of leadership between a Wanax and a Lawagetas in the Mycenaean realm is replicated in the Nordic realm, which also copied Mycenaean material culture closely (Reference KristiansenKristiansen & Larsson Reference Kristiansen and Larsson2005, chaps 5.4 & 6.5).Kristiansen K, Suchowska-Ducke P. Connected Histories: the Dynamics of Bronze Age Interaction and Trade 1500–1100 bc. Proceedings of the Prehistoric Society. 2015;81:361-392. doi:10.1017/ppr.2015.17
  • A suite of linked histories across Europe transpires, when attaching importance to the fact that the time period in which it all began, c. 1600 BC, was a turning point on a European scale. The precise timing may be debated, but it is here suggested that the link of change could ultimately have emanated from the early post-eruption Aegean with embryonic Mycenaean hegemonies. […] While commencing c. 1600 BC,NBA IB, in a manner of speaking, did not come into full fruition until c. 1500/1465 BC in NBA II, which is therefore justifiable as the first true highlight of the southern Scandinavian Bronze Age (Kristiansen, 1998; Kristiansen & Larsson, 2005). In NBA II, however, the Carpathian connection is no longer culturally visible but rather completely absorbed in the now uniform Nordic koiné. Instead, clearer glimpses of Mycenaean cultural impact occur in Scandinavia (Kristiansen & Larsson, 2005). This is now sustained by the testimony of lead isotope analyses (Ling et al., 2014). The Aegean seems from 1500 BC directly included in the Nordic sphere of interaction.” Vandkilde H. Breakthrough of the Nordic Bronze Age: Transcultural Warriorhood and a Carpathian Crossroad in the Sixteenth Century BC. European Journal of Archaeology. 2014;17(4):602-633. doi:10.1179/1461957114Y.0000000064

What’s missing and perspectives

There are many details missing in this brief overview, but I want to point out the ones that are technically missing in order to confirm (or deny) the basics of what I have explained here:

  • Getting samples from North India dating to the early Vedic culture (2000-1500 BC) to confirm (or deny) that they were local people. I’d give this a probability of > 95%.
  • Getting samples from North India dating to the period of 5000-4000 BC to see if there was a big change in the population at that time which could correspond with the arrival of IE speakers to the subcontinent. The probability of this I’ll leave it as “unknown”.
  • Getting samples from early Hurrians (2300-1800 BC) to know if they came from the steppe. This one is not too important for IE questions (except for that alternative possibility of Yamnaya-Catacomb cultures being IE), but if Hurrian could be confirmed to be a steppe (NEBA) language, it would be the key to investigate the whole language family. I’d give this 60-70% chances of being correct.

There are many other samples that we are missing and would help for better knowing all the details, but I’ve listed the most important ones for the purposes of this post. Let’s hope that we don’t have to wait too long to get the answers.

Now I’d like to summarise the languages that may have come from the steppe (those I’ve been referring here as NEBA languages). If the existence of this language family can be confirmed, it would become a very interesting and important subject for the study of European linguistic (pre)history. The fact that linguists can now know with certainty that all of northern and western Europe was repopulated by newcomers from the steppe between 3000-2300 BC, and therefor that all of that area can have one and only one substrate, common to all the area is an amazing step to finally be able to study the substrates of Europe in a scientific and coherent way. Here is a list of the more likely candidates to be part of this proposed NEBA language family:

  • Basque/Aquitanian (> 95% probability).
  • Iberian (> 95% probability).
  • Tartessian (if not a Celtic language, > 95% probability. If Celtic, then it’s Celtic. I could mention Pictish in this same category, though Pictish has much more chances of being a Celtic language).
  • Etruscan (> 60% probability. Further aDNA samples won’t tell us more than what we already know, so it’s essentially a linguistic issue).
  • Hurrian (60-70% as it stands now, but getting the right samples from ancient DNA would confirm it or deny it with almost 100% certainty either way). Urartian would be linked to the outcome of Hurrian.
  • North East Caucasian (~50% chances. It depends a lot on the outcome of Hurrian).
  • North West Caucasian (Unknown probability. It’s strictly a linguistic issue, largely about it being related to NE Caucasian or not).
  • Uralic languages (~50% chances. See the Appendix I for some insights into the matter).
  • Paleo-Sardinian (poorly attested, it’s again a linguistic issue where aDNA has already told us it’s at least possible. Probability around 30%?).

Finally, I’d like to stress that this post is in no way intended to be particularly complete (that would require to write a book with a lot of research) nor a definitive solution to the problems it tries to address. As the title says, this is an alternative view (interpretation) of the evidence we have so far, and my hope is that it can serve as a framework for linguists interested in IE languages or Old European languages to be able to better understand the data that is available and decide to what extent they agree with one view or another, as well as serving as a way to asses future ancient DNA studies with some of the ideas and predictions contained here to see how they fit with either view.

158 thoughts on “Origins and spread of Indo-European languages: an alternative view

  1. Hello Alberto,

    I read your article 2 weeks ago and found it very convincing, since then I have been gradually making my way through the comments and one discussion (I believe with Gaska?) piqued my interest.

    The comment I’m referring to alluded to some of the CWH showing increased cultural similarities to the broader neolithic communities, adopting an agro-pastoralist lifestyle amongst other things.

    I’m not sure if the original poster made this following claim specifically but I want to make it out of curiosity and to see if you have any refutations:

    If we acknowledge the Bell Beaker culture ultimately began in Pre-Steppe communities in Iberia and state that this same culture was imported in some form to R-P312 clans beginning the NW BBC culture, is that not enough evidence that could imply a language transfer?

    Is it possible that the NW BBs adopted a “Vasconic-like” language from their interactions with the Iberian Bell beakers? and is it thus possible that the the original language of the CWC is not related to Basque? What’s your specific evidence for the BBs being explicitly related to the CWC linguistically?

    I would also like to specify, I agree with your placement of the PIE homeland in South-Central Asia but would also like to introduce some speculation that perhaps PIE entered the steppe shortly after the Proto-Anatolian migration and that this language developed into Tocharian and spread with the Sintashta-Andronovo into Tarim? I believe this vaguely fits the timeline as far as divergence is concerned? This would thus make the CWC an early form of IE, related to Tocharian but unrelated to later European languages which I do agree spread later from the Balkans.

    My theories are incomplete, of course, and I don’t have any specific evidence beyond what’s already been introduced in these comments, I’m merely throwing these ideas out to see if they have any merit.

    Excited to hear your response

  2. Hello Atkan,

    Thanks for your comment. You do raise some interesting questions that require further explanation.

    The comment I’m referring to alluded to some of the CWH showing increased cultural similarities to the broader neolithic communities, adopting an agro-pastoralist lifestyle amongst other things.

    Yes, the people from the steppe were originally pastoralists and there’s very little evidence of crops in the steppe itself at that time (LN/EBA). And probably the main reason was that the largest part of the open steppe was not suitable for crops due to the weather conditions. However, as this people moved into Europe they did start to settle and increasingly adopted crop farming, given that the conditions allowed for it. To which degree this was a direct influence of the Neolithic farmers that survived in Europe until the arrival of steppe groups is something I’m not too well informed to have a clear opinion. The steppe people were not completely unfamiliar with crops, it’s more that the environment of the open steppe didn’t allow for a crop based economy. So their gradual shift to an increase reliance on crops can be due to the better conditions for it without necessarily needing a direct influence of the EEFs. However, it’s likely that having those EEFs still around certainly helped in this shift.

    If we acknowledge the Bell Beaker culture ultimately began in Pre-Steppe communities in Iberia and state that this same culture was imported in some form to R-P312 clans beginning the NW BBC culture, is that not enough evidence that could imply a language transfer?

    This is a fundamental point. In the discussion with Gaska above I already mentioned that one of the problems that people are having when understanding the BBC comes from the mixing of old and new data and that this must be sorted out by archaeologists so that we clear things up.

    There are two completely different cultures, which are completely unrelated, but share the same name due to legacy archaeological hypotheses. Now that ancient DNA has showed clearly that the BBC from the Tagus estuary (and any related Chalcolithic culture of Western Europe) is completely unrelated to the BBC that succeeded it, it’s time to separate these two cultures by giving them two different names.

    The people from the steppe that descend from the CWH and that are labelled as BBC didn’t borrow the culture from the previous people labelled as BBC too. The two cultures are different and unrelated. The confusion comes from some ceramics from the steppe being deposited in some burial sites of local people. The reasons as to why this happened can be debated, but they don’t change the fact that we’re talking about different people with different origin, culture and, therefor, language.

    For example, if you find a single male burial under a barrow with grave goods that include things like tanged daggers and palmela points made of arsenic bronze and that burial is labelled as belonging to the BBC, if you get DNA from it you will find a male belonging to R1b-P312 lineage and steppe origin.

    Is it possible that the NW BBs adopted a “Vasconic-like” language from their interactions with the Iberian Bell beakers? and is it thus possible that the the original language of the CWC is not related to Basque? What’s your specific evidence for the BBs being explicitly related to the CWC linguistically?

    The specific evidence is also discussed above with Gaska. The BB communities where the males belong to R1b-L51 lineage descend directly from the steppe communities that came, well, from the steppe, together with the CWC communities. The origin of the R1b-L51 people is not really debated since it’s very clear. The only thing debated is the origin of the Bell Beaker Culture, but that’s just due to the confusion explained above of having two unrelated cultures sharing the same name. Once you change the name of one of them there’s nothing else in common between both people.

    I would also like to specify, I agree with your placement of the PIE homeland in South-Central Asia but would also like to introduce some speculation that perhaps PIE entered the steppe shortly after the Proto-Anatolian migration and that this language developed into Tocharian and spread with the Sintashta-Andronovo into Tarim? I believe this vaguely fits the timeline as far as divergence is concerned? This would thus make the CWC an early form of IE, related to Tocharian but unrelated to later European languages which I do agree spread later from the Balkans.

    It could have been possible that IE languages entered the steppe early enough for the CWC to have been IE, but the evidence we have says that this didn’t happen. It’s still possible that IE entered just a bit later and that Yamnaya was IE, but we’ll need more data to really know.

    Regarding Tocharian coming from the Sintashta-Andronovo culture, what’s the evidence that suggests you this scenario as a likely one? Andronovo became Indo-Iranian as we know from the Scythians. How could they have carried the Tocharian language to the Tarim Basin?

  3. I may not write further posts unless something big comes out, so I’ll comment here very briefly about the new paper (pre-print) about the spread of Celtic languages:

    Tracing the Spread of Celtic Languages using Ancient Genomics
    Hugh McColl, Guus Kroonen, Thomaz Pinotti, John Koch, Johan Ling, Jean-Paul Demoule, Kristian Kristiansen, Martin Sikora, Eske Willerslev
    bioRxiv 2025.02.28.640770; doi: https://doi.org/10.1101/2025.02.28.640770

    First the obvious. There are no surprises and everything is as expected. Celtic expanding from East-Central Europe to the west, starting in the late Bronze Age and continuing during the Iron Age. A few quotes to summarise their findings (emphasis mine):

    For England, we find a series of transitions in the prominent farming ancestries present for each time slice (Fig. 3). Initially, between 4800–4000 BP, we find a individuals are modelled with a high proportion of Bell Beaker related ancestry, and the tendency to have a slightly higher proportion of local British-Irish Isles Neolithic ancestry, relative to the other Neolithic-related ancestries. By the Middle Bronze Age (4000–3200 BP), the highest Farmer- related ancestry is French/Iberian Neolithic-related rather than the local Neolithic ancestry, consistent with recent studies suggesting migrations from the mainland. This migration, specifically the Iberian connection, is further supported by evidence that the UK received copper from Iberia during this phase (3350/3250–750 BP). However, in the Late Bronze Age, we see a shift, in which the proportion of Italian Neolithic ancestry has increased to similar proportions to that of French/Iberian Neolithic. In the Iron Age, similar patterns are seen, with the additional appearance of Bronze Age Anatolian-related ancestry.

    In France, we see a similar transition (Fig. 3). During the Early and Middle Bronze Age, more local French/Iberian- than Italian Neolithic-related ancestry tends to be present. By the Iron Age, the relative proportions have swapped, so Italian Neolithic-related ancestry is the highest, accompanied by Bronze Age Anatolian ancestry.

    Further east, in the Czech Republic, we see the increase of Italian Neolithic-related and Bronze Age Anatolian-related ancestry between 3200–2800 BP (Fig. 3). We also note that we detect no evidence of French/Iberian Neolithic ancestry. By splitting further into the cultural phases for the region, we find that this ancestry profile in the Czech Republic occurred by 3300 BP, in individuals associated with the Tumulus Culture and continuing into the Knovíz and Hallstatt Periods (Extended Data Fig. 1).

    Relevant to the appearance of Italian Neolithic and Bronze Age Anatolian-related ancestry in Western Europe by the Iron Age, we included individuals from Hungary/Serbia (0_3_4_2_2_C_2800+). Consistent with the results found from using the Farmer-related ancestries as a proxy, we find the appearance of Bronze Age French/Iberian ancestry appearing in England during the Middle Bronze Age, and the Hungarian/Serbian Bronze Age reaching widespread distributions during the Iron Age (Fig. 4). In the Czech Republic, we find almost all individuals being modelled with a large proportion of Hungarian/Serbian ancestry during the Late Bronze Age.

    So that’s it, basically. Ultimately, the language came from the Proto-Italo-Celtic (or Italo-Celtic sprachbund if one prefers such classification) in the North-Western Balkans, and spread from there to the eastern Alps regions during the late BA, where it presumably became Proto-Celtic. Then from there it expanded to Western Europe starting in by the end of the BA and into the Iron Age period.

  4. @Alberto

    The argument for Sintashta and Andronovo being Tocharian comes mainly from my Indian friends who support OIT. They don’t analyze the evidence in Europe as granularly as you guys because they’re mainly concerned with refuting the idea that Indo-Iranian comes from Andronovo and so tend to accept the mainstream narrative on Europe claiming that the CHG-like component spread Proto-European to the steppe and thus Europeans get their language from them, where they differ from the mainstream is the claim that Sintashta-Andronovo originally spoke Proto-Tocharian prior to their assimilation into Iranic by BMAC, they claim that the Tocharian speakers in Tarim were ultimately the only group to avoid this assimilation.

    I shall look back at their arguments to see if i can go into more detail:

    “starting in 1500BC, Xinjiang experienced significant admixture from Andronovo (¬50%), a stronger impact than any group found in modern India or the Iranian plateau… samples in Xinjiang have R1a subclades exactly matching Sintashta-Andronovo… if steppe = Indo Iranian why did Tocharian not shift like India and Iran did…”

    That’s as brief a summary as I will make as most of their arguments aren’t particularly relevant here and are mainly concerned with disproving the steppe hypothesis in the context of South Asia.
    Suffice to say if the steppe hypothesis was proved even remotely correct I would say their overall argumentation is solid.

    However if I were to steelman my own speculation I would say the settlement of Tarim and thus the spread of this Proto-Tocharian occurred before Andronovo adopted the Iranic languages and that Iranic largely reached the NE extent of Andronovo around the period of the Proto-Scythians.

    Your rebuttals are however convincing to me, I’ve heard comments in the past from people such as Survive the Jive who also state there isn’t any clear evidence of relation between the Iberia BBC and the NW BBC.

    During my research following my discovery of this blog I have been mainly concerned with what the religion/culture of this NEBA people could be, I’ve seen arguments on here suggesting that NEBA was less warlike than the EEF which I agree with to some extent in the sense that they’d be less concerned with protecting their own territory and destroying nearby competitors (They are nomadic after all and so could just move to another region) however the notion that they were “Peaceful” (by ancient standards) to me runs counter to everything we know about the core worldview of later steppe peoples who seem to be universally more violent/ambitious than their settled neighbors.
    For all the flaws in the Steppe Hypothesis I would at least argue that IE the morality/worldview does to some degree align well with pastoralists such as the WSH.
    However, its apparent through archaeology that the NEBA cultures prior to the IE expansion were heavily Solar focused, more so than any IE religion I know that tend to venerate fire and Storm gods most highly etc. You even see the use of solar sites stone henge decline precisely when the celts are argued to have arrived via the Urnfield culture when no such decline was observed during the transition from Farmers to BBs.

    If you have any ideas regarding their religion and/or culture I would be very interested to hear them however I will concede my speculation regarding the BBs “converting” to a Vasconic language.

  5. @Atkan

    Yes, I guess that from the point of view of the CWH being Indo-European, it could make sense to link Andronovo to Tocharian (since it’s impossible to link it to being Indo-Iranian originally). But even then it’s a bit of a stretch in terms of timing. The archaeological features linked to Indo-Iranian appear in early Andronovo, particularly in the eastern part which would be the one to first reach the Tarim Basin. The latest for pretty much all of the Andronovo horizon to be Indo-Iranian would be around 1400 BC. So a group moving to the Tarim Basin (or close to it) ca. 1500 BC would theoretically be possible if they had been a bit isolated before migrating there.

    That’s of course assuming that the CWH (or at least its non-western part) was IE, which there’s no evidence for.

    I still think that the IAMC population offers a very easy explanation, with evidence in the archaeology, genetics and, crucially, linguistic (the idea of Tocharian coming from steppe populations, whether Afanasievo or Andronovo has some linguistic problems related to agricultural vocabulary of PIE origin present in Tocharian that’s hard to link to the steppe).

    Regarding the CWH being peaceful or not, I don’t think there’s a good way to define whether a population is more or less peaceful in an essential way. It all depends on the circumstances. EEFs could be peaceful at many times and violent at others. The same with the steppe people. I don’t think one can consider the populations as a whole as peaceful or violent across space and time. The only thing that matters when it comes to the relationship between the steppe people and the EEFs they met across Europe is that: 1 – They integrated EEF women into their communities, and 2 – They didn’t mix with them culturally, economically or otherwise, since the steppe communities thrived while the Neolithic ones died out. If both communities had established good relationships and broad exchange (of culture, goods, etc…) it should be expected that both of them would have evolved in the same way. But this clearly didn’t happen.

    About the religion/cultural beliefs of the steppe people, that’s really hard to answer. Throughout the LBA and more during the IA, they probably changed quite drastically (as they switched to IE languages too), so most of it was lost by the time we have some records. One could imagine a relatively simple believe system given the early stage of development they were it (pastoralists groups, organised in clans), so maybe not very different from later steppe groups that didn’t interact extensively with more developed societies, like the Andronovo and Scythians did: Turks, Mongols…

  6. @Alberto

    I agree it likely had a more “primitive” and shamanistic nature like the Turkic or Mongolic religion as this seems broadly typical of pastoral societies as a whole.

    Whilst we obviously have no explicit records of their religion there are various traits we can observe across the CWH, particularly in the Bell Beaker culture. The sun discs being the most prominent feature, reverence for the Sun and it’s cycles seems to have at the very least been more prominent than it was later amongst the Indo Europeans (Phase 2 of the Nordic bronze age preserves solar motifs but relates them to a “Sun Chariot” rather than a standalone disc)

    On top of this there are certain features in western European mythology that I’ve been unable to find clear comparisons of in Vedic or Iranic culture. For example there is the goddess An Morrigan who some speculate is cognate to the Basque goddess Mari as her name isn’t conclusively Indo European. On top of this there is folklore surrounding the Fae/Fairies which seems to be unique to Northern and Western Europe.

    Overall the trend I observe in Europe definitely implies a “religious substrate” and should the origin of Vasconic on the Steppe be proven there is a definite implication that this “substrate” is Steppe in origin as the pre Christian Basque country shows the clearest equivalents to this substrate that the Indo Europeans must not have had a mechanism to assimilate.

  7. @Atkan

    That could be an interesting field of research. I personally haven’t looked into any “religious” substrate in Europe, but there could well be one that is pre-IE.

    One problem is that the largest part of the Basque pre-Christian mythology is lost, and Iberians had significant influence from the Eastern Mediterranean already since the EBA, becoming quite strong in the EIA. So the only other way to know is comparing to IE cultures from Asia and see what’s common to Europe but not to Asia (or the Southern Balkans).

    Right now it’s really difficult to correctly identify any pre-IE substrate in Europe because of the traditional view that things that are common throughout Europe must be because they’re IE. That assumption has led to PIE (language, culture, religion, etc…) to become very contaminated with elements that came from the steppe. Until it becomes widely accepted that steppe = non-IE (and that it was all over Northern and Western Europe) it will be difficult to see progress being made.

  8. While in the subject of substrates, looking at Krahe’s Old European Hydronymy maps, I noticed that the area includes the East Baltic, which is relevant in this context given that Neolithic farmers never settled that region. The first farmers there were the CWC people.

    Also interesting in this respect is their mention further down where they include examples of the river names Isar, presumed to be Celtic (X. Delamarre), their mention of the Proto-Slavic *ezero (lake) from Proto-Balto-Slavic *éźera or *áźera (lake) (Old Prussian: assaran, Latgalian: azars, Lit: ẽžeras) which has a dubious PIE etymology, while there’s Basque aintzira (lake).

    There are so many words in Europe that are assumed to be IE but they’re just attested in European languages, often to the exclusion of Greek, but can’t be traced to any other language (since Basque alone is a very limited source). With the current knowledge a more comprehensive analysis could be done, leaving old assumptions behind and checking with the possible languages that may come from the steppe (I have referred to them in the article as NEBA languages) that could yield some interesting results. Though for this work to be done we’ll first have to wait for the mainstream papers dealing with the PIE question to be able to bring out this information.

»


Leave a Reply

Your email address will not be published. Required fields are marked *