No, Levantine is not a “dialect of” Arabic

Published in

East Med Project: History, Philology, and Genetics

15 min readJan 2, 2018

SUMMARY: Lebanese (more broadly North Levantine) is influenced by Arabic (as well as other languages, such as Aramaic and Canaanite/Phoenician, plus its own local evolution), but it is not a version of Arabic (nor is it a version of Aramaic or Phoenician). It is at least 3/4 of a language away from Classical Arabic (fus7a). Much of the claims establishing “descendance” are unsound statistically.

[Since I wrote this piece there has been research by Ahmad Al-Jallad and Marijn van Putten effectively proving that the Levantine and other vernaculars do not descend from fus7a, a parallel language largely constructed by grammarians during the Nahda. The notion that Levantine and other vernaculars are “lahjé” of fus7a is effectively a politically motivated fabrication by Arab fascists who aimed at degrading localism. Classical Arabic is a cousin, not an ancestor of Levantine.]

[Like every schoolchild in Lebanon I have a profound, unmitigated love for the Arabic language and an admiration for its poets. But it does not mean that I should swallow the flimsy theory that Lebanese (or N. Levantine) — which I find less attractive — is a minor vulgar inferior subset of it.]

The Levantine have been saying “bét” for at least 3200 years, now they say “bét” but it suddenly from a “dialect” of Arabic. It is foolish to think that a population will speak a language, say Aramaized Canaanite, plus (mostly0 a local language, then suddenly, tabula rasa, switch to another one for the same words.

Rivers flowing uphill

It would be an anachronism to assert that Italian is a dialect of Catalan, but safer to say that Italian comes from (vulgar) Latin. But when it comes to Lebanese (more generally NorthWestern Levantine), the “politically correct” Arabist-think-tank view (low-intellect Westerners trained into something called “Middle East Studies”) is that is is derived from Classical Arabic (calling it Lebanese “dialect” of Arabic) to accommodate sensitivities — even linguists find circular arguments to violate the arrow of time to serve the interest of panArabism. In situations where there are similarities between a word used in Leb and Arabic, they insist it is derived from Arabic not from a common root of both. (Most Lebanese are confused by diglossia as one is not supposed to write in the spoken language). Unlike Indo-European languages, Semitic languages have a criss-cross of roots and considerable areal diffusion to assert clean descendance, hence statements such as “A is a dialect of B” don’t have the certainty and neatness found elsewhere, which we will argue, requires orthogonal factors. Even Arabic is ill defined (historically, it may be referencing a nonpeninsular Western population) and the definition “Arabic” is largely circular. To make things more complicated, what linguists call “Arabic” isn’t classical Arabic but some hypothetical construction arbitrarily called “Proto-Arabic” (could have been called “Middle Semitic”) — so linguists and politicians don’t even mean the same thing. Even linguists get confused about the circularity and forget their own definitions, and their Arabist students are even more confused.

Write in Lebanese/Levantine/Modern Canaanite!

Regardless of its origin, there is no point insisting on degrading the spoken language. It remains that Classical Arabic sounds soooo foreign (especially to people who didn’t study in it), which explains why people send notes in French or English, not Arabic. (Data point: I sold 97% of my books in French and English in Lebanon, 3% in Arabic. I don’t have among my friends any Leb my generation and younger who reads Arabic except for legal docs, and about not a single one who can actually write it. I have almost never received a letter/email in Arabic from another Lebanese.) Use any character you want: Latin (which is of Phoenician origin) is simpler because you don’t need a special keyboard/effort to switch, but any alphabet works politically (including the Arabic one) since they are all of Phoenician origin!

The points here are 1) Lebanese (more generally NorthWestern Levantine, neo-Canaanite) is to be treated as a standalone Semitic dialect (or language) that descends from other known and unknown languages, including Arabic (which itself was influenced by same predecessors) but has not inherited from it as much as marketed (broken plurals but not its rich verb forms). And never ignore its own developments, by its own evolution, separately from other Semitic languages. 2) Its grammar as we will see below remains largely nonArabic. Many words that are in both Leb and Arabic but not common in Aramaic happen to be in North-Phoenician (Ugaritic). 3) Its vocabulary largely predates Arabic (even in cases where we got what appears to be Arabic innovations). I took a list of the most frequent statistically used words (by Zipf law, > 80% of vocabulary) and looked for words that exist in both Leb and Akkadian, Ugaritic (North Phoenician), and show that very very few exist in Arabic but not other Semitic roots, hence could have only come from Arabic. (Lamine Souag did the same with a poem by Said Akl, but without statistical methodology nor any understanding of probabilistics notions s.a. Brownian Bridge).

More technically, as a probabilist, I use the standard Brownian Bridge Argument: if something was at X period t-\delta t, is at x at period t+\delta t, then the degrees of freedom as to where it was at period t are constrained. In other words, a statistical priority should always be given to the earlier root.

The discussion on Youtube explaining why the classification requires independent categories.

Methods used by Semiticists is not scientific. Further, **areal diffusion** makes transmission arrows very fuzzy. The only right way to proceed is by **PCA, which explain why as a statistician/information theory professional I am offended to see this jargon-filled crap called “scholarship”.**

4) The “Arabization” mission promoted by the American University in Beirut in the 1860s (starting with the (re)translation of the Bible) seems to infect the most low Intellect (II) Westerners of the think tank/ State department Arabist types, (Western losers you meet at conferences) not locals — most people who disagree with the point and support the orthodoxy don’t speak either Leb or Aramaic, or fail in basic reasoning (talk toSyriac scholars).

Note that Anglo-Saxon low-Intellect Arabists (originally Protestant preachers) pushed Lebanon to be part of the “Middle East”, while both the Catholics (Italian/Provencal/French) and the Ottomans positioned it culturally as part of the Eastern Mediterranean.

5) The latin alphabet (actually Phoenician) lends itself better to Lebanese, with such local vowels as é and o— but that’s another note.

6a) Unlike genetics that has rigorous mathematical formulations and clear-cut flows, linguistic categories are fuzzy and, for Semitic languages, monstrously unrigorous.

6b) From a scientific standpoint, linguistic claims that Lebanese is a dialect of Arabic (or some conveniently abstract construct called Proto-Arabic) are a) totally unrigorous handwaving believed from sheer repetition, b) fitness to few rules made on the fly (and subject of overfitting: you pick the rules that makes a language be part of the group you like), c) with occasional brushing off the notion of mutual intelligibility between Leb and Arabic (or Proto-Arabic); all of these presented without any attempt to meet minimal standards of scientific evidence. (What qualifies me to write this? Because as we will see further down linguists play geneticists with A comes from B not C without letting you know that it is much, much fuzzier since B also comes partly from C. As a statistician, I am repelled by the mixing of causal assertions in the presence of dependent variables.)

6c) Linguistic models suffer a realism (long on theory, short of practice) problem, owing to their very poor scientific standards and empirical mapping. This explains why the late Frederik Jelinek, the author of the magisterial Statistical Methods for Speech Recognition was highly critical of the informational value in linguists’ heuristics. His criticism is well known with the pun: “Every time I fire a linguist, the performance of the speech recognizer goes up”. In fact linguistic distance can only be seen as a statistical/information distance problem best handled via deep learning methods in the absence of orthogonal factors.

7) The low Intellect Arabists and Middle East Studies “Experts” call Lebanese “Lebanese Arabic” but their Slavic studies peers don’t use “Bulgarian Proto-Russian” or “Serbian Russian”.

What do people call “Arabic”?

In a skit an ISIS man goes to a Christian Lebanese village, Zghorta, and shouts in Classical Arabic (“raise your hands!” “ارفع يديك”) to a Zghorta villager, who answers him “speak to me in Arabic!” (7ki ma3é 3arabé). Likewise, in Saudi Arabia, I once heard a Lebanese fellow asking the hotel manager: “don’t you have Arabic food?” (meaning East Med/Lebanese) as all they had was… Arabic food (Saudi preparations of rice etc.)

The White Mountain (Mount Lebanon) from my window in Amioun

The very etymology of “Arabic” has confused people, since it may mean“Westerner”, that is, nonArab (and homonym with 3araba which might be another root). Speak to me in Arabic may mean “speak to me intelligibly” (3arabé mshabra7) — since 3arab means grammatical and intelligible — and people got confused about what language they were speaking.Anachronism

Many people who are fluent and Levantine and classical Arabic fail to realize that the distance between the two is greater than between many languages deemed distinct, such as French and Romanian… Slavic “languages” such as Ukranian and Polish are much, much closer to one another than Levantine and Arabic. Same with Scandinavian and Germanic languages. Also note that if Arabians share vocabulary with Lebs, it is because of two-way flow.

(If the Lebanese know Arabic, it is from education system and Television, not from speaking it).

Collinearity and Other Flaws by Linguists

Collinearity doesn’t allow strong categorization: Traditional linguistics categorizes languages as independent variables, failing to take into account co-linearity, i.e., if Y= a_1 X_1+a_2 X_2 + \eta (noise), the effect will show loading in a_1 or a_2, not both. So if Levantine resembles Arabic, and Arabic resembles Aramaic, and Aramaic resembles Canaanite/Phoenician/Hebrew, and to make things worse, Arabic also resembles Canaanite, the tendendy is to believe that Levantine comes from one (the a_1 with the highest load) not another.

Accordingly, simplified linguistics fail with Semitic languages because of confounding, much more consequential with Semitic tongues than IndoEuropean ones. In English we know that what comes from Latin has no collinearity with Northern European sources, except for remote roots.

So if someone claims: Leb is a dialect of (Arabic/Aramaic/Zorgluz…) it is a weaker statement than Italian is a dialect of Latin. We should say: Leb is a dialect of Semitic.

The only remedy is to do, as in genetics, PCAs (orthogonal variables that are abstract) hence show Semitic languages represented as dots on a 2–3D map with orthogonal bases. This is not done by Semiticists and I consider the linguistic critiques to this piece invalid and highly unscientific (not even at the level to be wrong).

Areal Influence: If there is a continuum of dialects through the area, from the Levant to the fertile crescent, it can be due to areal features rather than genetic ones. In other words, lateral influences rather than vertical ones.

The Phyla and Waves Model used by Semiticists is not very convincing: we are not dealing with the clarity of genetics; “evidence” is not stochastically elaborated.

Next we look at the “markers of Arabic”, which may apply in many cases, but do not justify categorization.

So-Called “Markers” and “Features” of Arabic

Next we will look at the weakness of on-the-fly marking as linguistic demarcation without too many details (a more technical note to follow). Further the methods of markers/features used by Semitic linguists suffers an elementary inferential flaw: it consits in finding features/markers in

Arabic => confirming in Lebanese,

not the more rigorous: finding, in addition, features/markers in

Lebanese => confirming in Arabic.

In other words it ignores developments that are uniquely local. That assumes that their markers are really features.

Five Vowels: Northern Levantine uses the French é sound (the diacritical rboso, present in Syriac) where Arabic has an “i” (kasra) or long i, in addition to it. (batyté, Ghassén, etc.) (Zré2 is arabized as Zurayq at the American University of Beirut. Someone should tell them.) It also uses the “o” as distinct from “oo”.

La as an object marker: shufto la Antoine? (Bassal, 2012)

la2: the glotal addition to la (negation), la2 not present in Arabic is either Proto-Semitic or a local innovation.

Another flaw: mismeasurement of root change as distance. Using the Arabic innovation of a non Arabic root (2rdh for 2rtz) should not allow one to classify the term for scientific (informational) and cultural purposes as “derived from” Arabic, even if it makes sense from a linguistic standpoint in a refined toolkit. So if someone has been saying lb for years (for heart) for a few thousand years, then added an aleph to make it ‘lb (2lb), is is to be treated the same as someone saying corre or schmorglub for heart, now saying ‘lb? It is not the same distance! From a differential entropy standpoint, qlb in Arabic is largely lb. This is what linguists fail to get about their classification heuristics. Minor adaptations such as “al” for “ha” or “han” should not be a basis for calling a change of language. It is no different for Hebrew where Ashkenazis use a Germanic pronounciation for gutturals, which doesn’t make them speak a variety German. Linguistic classifications are a mess!

Strong a “2” (Basta Aleph): Lebanese has an emphatic silent “a”, known as “Basta” accent (“shu b22ello?”) but also in other parts for other words “ya 22alla” in Amioun (I’ve heard it sometimes in Syriac when they say “22aloho”). Roger Maklouf’s idea is that the Arabic strong “ص”, “ض”, “ط” etc. are just consonnants followed by emphatic 2a: “t22aleb”, “d22arab”, “shu s22ar?”. Hence, in the presence of the 22a which does not exist in Arabic, we don’t need these special consonnants. Roger surmises that if we don’t have them, and since the Phoenician alphabet didn’t have them, the natural conclusion is to assume that we just never used them (by Brownian bridge: neither then nor now means unlikely in the middle). This may explains the possible absence of the Aramaic 3ayin switch into Lebanese (discussed below).

SVO: Arabic has necessarily an VSO structure: Verb-Subject-Object (zahaba el waladu ila il manzil vs lzghir ra7 3al bét), Lebanese not necessarily so (varies). One can also notice a possible Armenian areal effect in some Beiruti neighborhood: VOS “Artine bi-Beriz kén”, “Artin laymouné biyekol” (Artin was in Paris) instead of “Artin ken bi-Bériz”.

Simple Verb-Subject Agreement: The grammatical stucture of Leb is somewhat similar to Aramaic. For instance, Leb uses the plural form for a verb before a plural subject; in Arabic the verb is singular.

Reflexive verb ending with the reflexive pronoun “lak”: killak shwayy, dakhkhinlak sigara, shuflak shi shaghlé, nimlak shi laylé hon,… Totally absent in Arabic. Found in Biblical lekh lekha… (rare for direct object). (Roger Makhlouf).

Nisba-ané: berrané, jewwéné is restricted in Arabic (Blau, 1967, cited by Bassal, 2012)

Diminution with on: dal3oun, mal3oun, etc. (Bassal, 2012)

Also note that Lebanese verbs mark tense not aspect.

Verbs forms: Arabic has 15 forms (OK, OK, with 5 rare ones); Levantine and Aramaic have the same 4–6 forms (depending on regions). Notice that the present tense “yaktubu (يَكْتُبُ) becomes 3am yiktob, 3am meaning “in the process of” in Aramaic.

The definite article: the “Al” in Arabic doesn’t exist as a prefix in Aramaic (it is suffixed), but does in Phoenician as ha 2a, and proto-Canaanite as hal and “l”. And it is not clear that old Lebanese distinguishes between lunar and solar, as Arabic does. Brownian Bridge: Leb may have retained a little bit of the prefixed definite article while under Aramaic influence.

Preposition fi: (from mouth, “f”), a marker of Arabic is absent from Lebanese. Ana bi-Amioun is Levantine for “I am in Amioun”. In Aramaic-Syriac (most versions) it would be “Ana bi-Amioun”. In Arabic “Innani fi-Amioun” (sometimes, but rarely, “bi”).

Mim-noon: plural mim in Arabic (beytohom) become noun in Aramaic and North Levantine (beyton, beytkon). Even Ibrahim becomes Brohin.

Ma as negation: The classifiers claim that of Semitic languages, a marker of Arabic is the negative “ma” for “la/o” in Canaanite. 1) “Ma” is hardly ever a full negation in classical Arabic: “iza ma” “lawla ma” & “in ma” is usually read as to “if”, “when” or “could be”, 2) “Ma” is a negation in Indo-European languages, so it came to the area to affect all languages, 3) “ma” is found in Bibilical Heb. (Kings, 12:16).

Words that have hamze, turn into “y”, i.e. Mayy in Levantine is water (as in Aramaic), Ma2 in Arabic, etc, and the “y” in Arabic can become olaf: Yaduhu in Arabic is ido (Yad->Iyd) in both Syriac and Levantine.

Qad: the grammaticalization of the particle qad as a perfective morpheme, as in qad fa3ala (he has done), a marker of Arabic, doesn’t exist in Lebanese.

Nunation: a marker of Arabic (tanwin) is absent in Lebanese. Can be modernization, but nevertheless significant in informational distance.

Loss of the anaphoric or remote demonstrative use of the 3rd person pronouns: The third person pronouns are proper demonstratives in Western Semitic s.a. Hebrew (Al Jallad, 2017) but not Arabic. I seems to be the case with Lebanese. Ha-seper ha-hu (Heb.) is ktéb huwé as well as ktéb hayda.

An: Another marker (Al Jallad, 2017), Arabic uniquely uses the particle an(na) as a complementizer and subordinator, e.g. arada an yazhaba (baddo yrou7). It appears to be absent in Lebanese.

Cannanite and Phenician shift: In Northern Lebanon, “Allah” becomes “Alloh”, “Taleb” is pronounced “Toleb”, even the y becomes “oy” (lésh in Beirut, loish in Bsharré. My first name is prounounced “Nsoym”). There is a joke that someone from Amioun went to buy an Ipad in Beirut and came back with an Ipod. But the shift is different from Eastern Aramaic where Sarah is “Saro” while for us it is “Sora”.

Use of gerund as verb in Leb

Collection of transformations akh (Ar.) vs khayy (Leb), etc.

The 3ayn shift: We saw the emphatic “Basta” hamzé, 22a, argument earlier. But an argument (Louag) is that the dhad became a 3ayn (Eretz in Hebrew became Ar3a in Aramaic), not in Leb hence we got it from the Arabs. There was a shift that stayed in Aramaic and Levantine use the Arabic dhad that does not have the shift (which is believed to imply that we did not get these words from Aramaic). But note that we know from Sibawayh’s al-Kitab that Arabs did not pronounce the dhad as modified tzadeh (which shows that past pronounciations were not necessarily as current and the question of the lughat al dhad is not settled). Note that in North Leb people may conflate ar3a with al3a, for ardh, as in Amioun. It may have come from Arabic, but it is far from certain.

(Note that broken plurals, about half of Arabic verbs, represent very little of a vocabulary, again, by Zipf’s law).

Another Central Flaw with Semitic Linguistics

Semitic linguists tend to equate language flows with population shifts, which, I have shown in Taleb (2018) is wrong since language renormalizes but not genes. In other words, Turkey speaks Turkic, India and the U.S. speak English, while gene flow show different dynamics.

Languages renormalize, get divorced from population flows

Grammar

ARABIC vs LEVANTINE( Beirut, Amioun)

1s Ana Ana, ana
2ms Anta inta, int
2fs Anti inte, int
3ms Huwa huwwe, hu
3fs Hiya hiyye, hi
2d Antuma into, ont
3md Huma hinne, hinn
3fd huma hinne, hinn
1p Na7nu ne7na, ne7no
2mp Antum into
2fp Antunna into
3mp Hum hinne, hinn
3fp Hunna hinne, hinn

ARABIC vs LEVANTINE
(long a, 2) long eh
1s 2akl 3am bekol [3am means “in the process of “ in Syriac]
2ms ta2kol 3am btekol
2fs ta2kulina 3am tekle
3ms yakulu 3am yekol
3fs takul 3am tekol
2d ta2kulani 3am bteklo
3md yakulani 3am byeklo
3fd na2kul 3amnekol
1p takuluna 3amteklo
2mp takuluna 3amteklo
2fp takulna 3am teklo
3mp yakuluna 3ambyeklo
3fp yakulna 3ambyeklo

ARABIC vs Amioun vs Beirut

Akaltu Kilt Akalt
Akalta Kilt Akalt
Akalti Kilte Akalte
Akala Akol Akal
Akalat Aklet Akalet
Akaltuma kelto Akalto
Akalat eklo Akalo
Akalata Aklo Akalo
Akalna kelna Akalna
Akaltum Kelto Akalto
Akaltunna Kelto Akalto
Akaltu eklo Akalto
Akalna eklo Akalo

Note 1: To confuse matters further, linguists seem to claim that what they mean is that Lebanese doesn’t descends from the Southern Semitic Classical Arabic, but from some abstract hypothetical construction anachronistically called Central Semitic Proto-Arabic, itself very different from Classical Arabic. But calling this “Arabic” is what is confusing the crowd. Naming causes framing. So if they don’t mean “Arabic”…

Vocabulary

Some examples:
“Zammar 3a l’kou3” Levantine
“Zammar 3a kou3” Aramaic
“Inshud 3al mun3atif” Arabic

For A***le:
“Bu5sh tizo” Levantine
“Bu5sh tizo” Aramaic
“Thaqb iliatihi” Arabic (or mu2a55ara)

Note that the Lebanese army march (one-two-three) is in Syriac “7ad, Tr(n)en, Tlete, Arb3a” (not Wa7ad, Etnen, …).

(The Tableau below starts with the standard list from Bennett. The orthography is not fully standardized. Lameen Souag has been nitpicking my list based on g->j, s<-> sh, k->kh, etc. shifts in pronounciations, which, again, don’t make it part of a language group. For we say Yesou3 for Yeshou3 (Jesus) which comes from Aramaic (Arabic is 3issa), Juwwa from Aramaic bgaw, etc. The “j” can be easily Persian. It would be classifying Mod. Hebrew as Germanic because w->v, 7->ch, 3ayin-> 2aleph, non-rolled rs. The more severe error is to conflate “descendence” and weak relation since we are not dealing with independent variables. To repeat, 2alb vs lb or qalb is a wrong problem since qalb itself may come from lb: antecedence is much much stronger indicator than relation. )