How math was recruited to invent the Jewish people

Eran Elhaik
7 min readJan 24, 2023

Who is a Jew is a debate that finds its roots in the Iron Age Kingdom of Judah and its conquest by the Achaemenid Persian Empire. The Hebrew word ‘Yehudi’ (Jew in English) has been used at least since 539 BCE to refer to the inhabitants of the conquered Kingdom, now called Yehud. ‘Yehud’ contributed its name to ‘Yehudi’ (Jew), initially for administrative purposes to denote someone from Yehud. However, it was not so simple. The Judaean deportees were the first to grapple with the challenges posed by this new concept of ‘Yehudi.’ They were from Judea, but they also retained their Israelite tribal affiliations. Should Yehudi be used as an honorific or disrespectful term (to criticize the people of Yehud)? The terms ‘Yehudi’ and ‘Yehudim’ are very rare in the Bible, appearing only 75 times, mostly in the Book of Esther, which shows how it was gradually adopted. At that time, circumcision began spreading for various reasons unrelated to religion. While over time, it became a hallmark of Judaism, it was also it’s undoing as it repelled new believers and prompted the successful spread of Christianity. As circumcision has always been practiced by non-Jews, it could never be the hallmark it was envisioned to be. The millennia-old debate concerning “Who is a Jew?” thereby persisted to our time, eventually becoming a pseudo-scientific question that geneticists have begun to tackle since the end of the 20th century.

Geneticists typically seek DNA markers (mutations) unique to specific groups and allow differentiation of one group from others. Over the years, several candidate markers — such as the Cohen-modal haplotype on the Y chromosome allegedly identifying members of a priestly class and even the BRCA genes — were proposed as genetic hallmarks for Jews. However, none of those markers performed as hoped, i.e., existed in most/all Jews while absent from most/all non-Jews and the search continued (Elhaik 2016).

In the early 21st century, scientists no longer bothered themselves with individual markers that, frustratingly enough, popped up later in non-Jewish populations. Instead, they analyzed hundreds of thousands of markers altogether, using complex mathematical tools that they did not fully understand. One of those mathematical tools, Principal Component Analysis (PCA), allowed condensing the complex genetic dataset into a much simpler dataset that could be visualized by a simple, colorful scatter plot. In a previous article, I demonstrated the dangers of this tool and how it shaped the political career of Sen. Elizabeth Warren. From these articles, it should be clear how anyone can produce their favorite results using PCA and why PCA became geneticists’ best friend forever, northern star, crystal ball, used tea leaves, and Wish Bear — all combined.

No sooner was PCA harnessed to tackle the millennia ol’ question — “who is a Jew?”

In 2009, David B Goldstein’s (2009) led a study that claimed, based on PCA, that Jews (i.e., Ashkenazic Jews [AJs]) are genetically distinct from non-Jews (i.e., Europeans).

Goldstein et al. (2009), Link to the paper

Their PCA results were a devastating blow to the “Jews are not a race” proponents. Goldstein concluded that AJ genomes carry an “unambiguous signature of their Jewish heritage… this seems more likely to be due to their specific Middle Eastern ancestry than to inbreeding.”

Other authors followed these footsteps, cementing the racial identity of Jews and their Levantine Biblical-like origins and enshrining PCA as the ultimate Truth Sayer device on ancestry, genealogy, history, evolution, epidemiology, and biogeography — all in one plot! After all, math has spoken! “The evidence for biological Jewishness has become incontrovertible,” declared Harry Ostrer (2012) and offered to settle land disputes in Israel according to the magnitude of the Middle Eastern ancestry in one’s genome, in line with the Zionist vision, at least the way he understood it. Ostrer’s offer was extremely generous towards the Palestinians and Bedouins, whose genome has 56–59% of that ancestry (Das et al. 2016), compared to AJs, who are already a minority between the Jordan River and the Mediterranean Sea, with only 50–0% Middle Eastern component (Elhaik 2017).

Putting aside Ostrer’s gift for diplomacy, the question of the prophetic powers of PCA, remains: can it really be used to differentiate Jews from non-Jews without even being curious about their genitalia? This was no longer a theoretical question as direct-to-consumer ancestry companies, like 23andme, already adopted PCA to assess ancestry, disease risk, and “cultural traits,” whatever that is. No sooner, “genetic Jewishness” became a product to be purchased, and genetic Ashkenazic origins — a trophy to cherish, no matter how minuscule that trophy was. Math took over where orthodoxy failed and picked up the fight lost 2000 years ago now to Christianity, offering the shortest possible route to Jewishness with intactivists welcome more than ever. But was it real?

In my recent paper, I showed that PCA results are not reliable, robust, or replicable. I demonstrated how expert users could easily manipulate PCA to generate any desired results (as ridiculous as they may be). Is it possible that this is what Goldstein and his colleagues did? To answer this question, let us first replicate their result along with their poor terminology (A in the figure below). Using the same approach, I can use PCA to show that Turks are distinct from non-Turks (B). Are they also a race? I can show that AJs and Turks either cluster, which by PCA logic indicates identity © or not, just because (D) and that AJs cluster with Spaniards (D), creating conflicting results.

Elhaik (2022), Link to the paper.

The trick with PCA should be evident by now! One can select the markers, number of individuals, and populations that will almost always give us the desired results (Here, I only manipulated the populations). Showing that PCA creates conflicting results should be enough to disqualify it as a scientific utility, yet, although scientists noticed that, they continued going back to their Wish Bear, drawing further conclusions about AJ’s origins. Let us examine these claims too.

The next test series (see the figure below) showed that AJs (solid green circles) are a “population isolate,” a group separated from all other populations (A) as the “Jews are a race” school uphold. I can also show that AJs cluster with Caucasus populations in support of their origin from Ancient Ashkenaz (Das et al. 2016) (B). I can show that AJs cluster with Amerindians, which must be due to the north Eurasian or Amerindian origins of both groups ©. Could these exciting results be used to support legal claims for Jewish resorts with Casinos in places like Brooklyn? I can also show that AJs cluster closer to South Europeans than Levantines (D) and may be entitled to EU passports! AJs who can no longer take Tel Aviv’s heat and humidity may find relief in their overlap with Finns, solid evidence of their ancient Finnish origin (E). Those who insist on living in the Promised Land can find comfort in the last analysis, which not only refutes all our previous findings but also proves that half of the AJs are of Finnish origin and the remaining have the lucrative Levantine origin. I can only hope that each half will find their grouping satisfactory.

Elhaik (2022), Link to the paper.

These examples demonstrate how genetic tools can be abused to support imaginary historical narratives. PCA earned its place as the most popular tool in genetics precisely exactly because of its great flexibility, which means that none of those results can be trusted.

Looking at the enthusiasm of scientists for PCA, remind me of Shakespeare’s Macbeth who probably described it best “a tale Told by an idiot, full of sound and fury, Signifying nothing,” or in a free translation to modern English, “anyone can do PCA and use it to create a fancy plot that tells a great story that lacks any statistical significance.”

The question of “Who is a Jew?” shall remain open, perhaps forever, as it was never posed as a scientific question but rather as a dilemma forged by the unique historical circumstances in Yehud. The conversions made by Rabi PCA will need to be undone, at least for those who did not ask for them in the first place.

The fate of the Ten Lost Tribes and the people of Yehud are some of the most fascinating questions in history, and fortunately, we do not require PCA to answer them. Instead, projects like Ancient DNA Origins that employ novel machine learning tools with ancient DNA from Israel (full disclosure: to which I contributed) to study the heritage and legacy of the ancient Israelites, and our connection with them has already made remarkable discoveries.


Das R, et al. 2016. Localizing Ashkenazic Jews to primeval villages in the ancient Iranian lands of Ashkenaz. Genome Biol. Evol. 8:1132–1149.

Elhaik E. 2016. In search of the jüdische Typus: a proposed benchmark to test the genetic basis of Jewishness challenges notions of “Jewish biomarkers”. Front. Genet. 7.

Elhaik E. 2017. Editorial: Population Genetics of Worldwide Jewish People. Front. Genet. 8.

Need AC, et al. 2009. A genome-wide genetic signature of Jewish ancestry perfectly separates individuals with and without full Jewish ancestry in a large random sample of European Americans. Genome Biol. 10:R7.

Ostrer H. 2012. Legacy: a genetic history of the Jewish people. Oxford: Oxford University Press.

Elhaik E. 2022. Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. Sci. Rep. 12:14683.