#DataDeepDive: Scripts & Languages of the Geniza
In this series, we take a deep dive into the Talk boards tags to look at how volunteers classify the fragments. You can read an overview of our Talk boards tags in the Sorting Phase Data review.
In our project, we ask volunteers to sort into Hebrew and Arabic script, two languages frequently found in the Geniza. But as many volunteers have found in sorting the Cairo Geniza, just because something is written in a script does not mean it is written in that language. Through the Talk board tags, knowledgeable volunteers have worked together to identify some of the languages (and other scripts!) that appear in the Geniza. The numbers following each tag here refer to the number of subjects with that tag — sometimes, the same tag would be used multiple times on the same subject.
As noted in previous posts, this doesn’t mean that the subjects definitively are written in the script or language of the tag. Because we are looking at these tags out of context of their conversations, volunteers may have been guessing or suggesting the script or language of the subject. As our content specialists review this list, we hope to confirm these counts and provide detailed listings of these languages across Geniza collections.
Aramaic (37) is a Semitic language that is cognate with Hebrew. Hebrew script is derivative of the Aramaic alphabet. (See Judeo-Aramaic for explanation.)
Judeo-Arabic (167) means the subject features Arabic text written in Hebrew script.
Judeo-Aramaic (20) means the subject features Aramaic text written in Hebrew script. (By the Middle Ages, Aramaic text would usually be written in Hebrew script — so scholars wouldn’t use this term. We’ve separated the two tags here to reflect volunteer input.)
Judeo-Persian (15) means the subject features Persian text written in Hebrew script.
Volunteers who could read Hebrew often tagged subjects as Judeo_Something (33), meaning a volunteer did not identify the language but suspected it was not Hebrew based on what they could translate.
Ladino (1), also known as Judeo-Spanish, is a language popular among Sephardic Jews. The tag was used three times in our project, but in context the conversations that made it clear the subject was not Ladino. However, we did find at least one example of Ladino in the project so far that was not tagged:
While Hebrew script tags were frequent, volunteers also found non-Hebrew scripts in the project.
The Coptic (2) alphabet was first used for the Egyptian language— it’s still used in Coptic liturgy today.
While we often find Hebrew script used for other languages, we had at least one case where Arabic script was used for a different language, tagged nonarabiclanguage (1). One of our moderators identified the Arabic script in Subject 12511220 as Ottoman Turkish.
In cases where multiple scripts or languages were found on a subject, volunteers used the tag mixed_languages (132). Subject 21953297 (below) has both Hebrew and Arabic script.
Volunteers have also found subjects that featured neither Hebrew or Arabic script. Those subjects are out of scope for the transcription phase for this project, but important to note in realizing the diversity of languages and scripts within Geniza collections.
Volunteers have found Roman/Latin script, like English (21), Italian (8), and Latin_script (7).
In the sorting phase, volunteers identified at least one fragment with Cyrillic script with Russian (1) and Georgian (1) — more have been found in the University of Manchester Library’s collection. Subject 12602863 features Russian, Georgian, and Arabic text, as well as a striking scene of a judge on its verso.
Subjects were also tagged as Greek (6), using the Greek alphabet. In Subject 21952244, a volunteer suggested that these Greek characters are likely magical gibberish, used to impress amulet customers.
👉 Read more Talk conversations or start your own by participating in Scribes of the Cairo Geniza on Zooniverse!