A Short Language Profile and the Need for Chichewa Datasets
Author: Amelia V. Taylor, PhD
“Mutu Umodzi Susenza Denga” (Chewa proverb)
A single head cannot carry / lift up the roof.
What is Chichewa?
Africa has over 1.2 billion people who express themselves in numerous languages (some estimate over 3000!). A known fact to those who are familiar with Africa is that many people are conversant in more than two languages. Frequently, this would be their own native dialect, a wider regional lingua-franca and official or colonial era languages such as English or Portuguese.
Chichewa is part of the Niger-Congo Bantu group and it is one of the most spoken indigenous languages of Africa. Chichewa is both an individual dialect and a language group as we shall discuss in this short article.
The language, Chichewa, also written as Cichewa, or, in Zambia, Cewa, is the native language of the Chewa. The word ‘chi’ or ‘ci’ is a Bantu prefix used for the tribal name, designating the language rather than the geographical region of the tribe. The word Chewa is the name of a group of people. Chichewa is called Chinyanja, for example in Zambia and Mozambique. Chinyanja was also the old name for the language in Malawi, before the country became a Republic. During that time, as a British Protectorate, Malawi was called Nyasaland.
The map in Figure 1 shows the coverage of the Niger-Congo Bantu group of languages which cover a large area below the equator Africa. Among the Niger-Congo B Bantu languages, Chichewa is spoken in much of southern, southeastern and eastern Africa.
Chichewa, with the code ‘ny’ is also one of the 13 African languages with a Google automatic translation. The code ‘ny’ was most likely chosen because the language was known first as Chinyanja.
This probably reflects the availability of written text in Chichewa compared to other African languages. However, as we will discuss in this article, there are several dialects of Chichewa which differ from each other in noticeable ways. I do not know whether this was taken into account for the text used in the machine language models by Google. But this is a whole new interesting topic in itself!
Who are the Chewa?
The Chewa are a Bantu speaking people, traditionally described as the descendants of the Maravi, who in the 16th (some say, in the 14th) century migrated to the present day Malawi from the region now called Congo-Kinshasa. Most of what we know about the migrations of the Cewa come from oral tradition. Samuel Nthara collected some of the oral traditions in his book Mbiri ya Achewa, published in 1944. The name Maravi first appeared in Portuguese documents in 1661.
Nowadays, some of the well known districts in Malawi where the Chewa live are: Mchinji, Lilongwe, Kasungu, Nkhotakota, Dowa and Dedza. The consensus is that the Chewa of the mainland kept their name as Chewa and lived mainly in the Central Region. The Manganja are the Chewa who settled in the Southern region. And some Chewa groups who settled at the lake or around the Shire River in the south are called Nyanja. Man’ganja (or Maganja) is southern Chichewa as opposed to the language spoken in the Central Region (which was also called Western Chichewa / Nyanja). There are phonetical, grammatical and vocabulary differences between these dialects.
Where is Chichewa spoken?
In Malawi, Chichewa is widely understood. It was declared the national language in 1968 and it is viewed as a symbol of national unity by diverse groups.
In Mozambique it is spoken especially in the provinces of Tete and Niassa, where it is referred to as Chinyanja.
In Zambia, it is spoken in Lusaka and in the Eastern Province (the language is referred to as Nyanja). The language spoken in Lusaka is sometimes called town-Nyanja as opposed to the Nyanja spoken in rural areas in other parts of Zambia, where it is referred to as deep-Nyanja. Nyanja is the language of the Police and the Army.
In Zimbabwe, according to some estimates, Chichewa is the third most widely used language after Shona and Ndebele. There is a sizable community of descendents from those who migrated to this area from Nyasaland during colonial times to work in the mines.
Chichewa is spoken in South Africa. There are a significant number of migrants from Malawi who work in mining, as domestic workers or in other industries.
There are radio services in Chichewa in Malawi, Zambia, South Africa and even in Ethiopia.
By displaying the geographical areas where the Chewa live, the map in Figure 2 below from the Britannica website, shows the approximate area where Chichewa is spoken. However this map mainly shows the Chewa who live in the south and in the east of Malawi (see above our short discussion about the Chewa as a group of people).
The map in Figure 3. shows the ethnic groups of Malawi, among whom the Chewa, the Nyanja and the Maganja are the known native speakers of Chichewa.
How many people speak the language?
According to sources quoted in Wikipedia, there are 12 million native speakers of Chichewa. A similar number is mentioned on the Joshua project website and includes Chichewa speakers from 8 countries of the world. This number seems then to refer to all the people who identify themselves as Chewa, Nyanja and Manganja, as these, according to the Malawi Population Census of 2018, make about 40% of the population in Malawi. However, in Malawi, the large ethnic groups of Lomwe, Yao and Ngoni have over the course of time adopted Chichewa as their native language.
It is the case that the number of people understanding and using Chichewa is much higher than the 12 million native speakers. Like Swahili, Chichewa is considered by some a universal language, a common skill enabling people of varying tribes and those living in Malawi, Zambia, Mozambique to communicate without following the strict grammar of specific local languages. In Zambia, many of those whose mother tongue is now Chinyanja have come to consider themselves Ngoni; Nyanja is a lingua franca, being spoken by the police and the administration.
Lingua Franca in Malawi, Zambia and Elsewhere
During the days of Nyasaland, Chinyanja (as it was called then) and Chitumbuka were both used as official languages in education, media and the civil service. From the 1930s, during the colonial government, learning Chinyanja was encouraged in schools and was introduced as a subject in Grade 3. There was an attempt then to establish Chichewa as the lingua franca, but this project was abandoned following strong opposition.
During the 1966–68 period, following independence, Chinyanja — renamed as Chichewa — was elevated to the level of the National Language. The first president, Dr. H. Kamuzu Banda identified himself as a Chewa. He had a keen interest in linguistics and was involved in linguistic work abroad and within Malawi. While his speeches were mainly in English with a translation in Chichewa, he frequently corrected his translators to discuss grammatical matters!
The Malawi Population Census of 1966 helped establish the status of the language as the lingua franca. The census suggested that Chichewa was the first language of almost 50% of the Malawian population, and more than three quarters understood it. As this was not a language use study, these percentages were questioned by some as they could not be truly verified. The elevation of Chichewa was seen as a political move, to the detriment of other native languages in use in Malawi (50% of the population used other languages in their own homes). After 1966, only Chichewa was allowed as the language of education, government, press and radio. This situation was resented by people speaking other languages, especially in the north of Malawi.
Another interesting point highlighted in the 1966 survey was that only 4.9% of the population understood English well. Fewer than 1% claimed to use English in their homes. This situation is similar today. This underlies the importance of improving language understanding through digital and automated tools.
Vocabulary differences
Language use studies are complex and expensive to run. The last one conducted in Malawi was the Language Mapping Survey, 2009, under director Pascal Kishindo. The study showed a high degree of use and understanding of various dialects of Chichewa among the diverse population, but also a significant use of other mother tongues in homes.
Similarly, a large 1978 study by the National African Institute looked at the comprehension of different native languages in Zambia, and found that Chichewa was the most commonly understood.The study showed that, linguistically, Chichewa shares large portions of its vocabulary with other main languages. For example, Cewa (Nyanja) was shown to share the following proportion of the vocabulary with other main languages: Bemba (57%), Tumbuka (49%), Senga (53%), Tonga (46%).
Language-Switching and Town Language
There have been very few language studies that looked at the difference in the use of Chichewa in urban and rural areas in Malawi. The anecdotal evidence is that, the town Chichewa is continuously absorbing many new words, some are imported from English. These can be words for which there are equivalent expressions in the vernacular, for example, maoranges, mabananas, maflower, magati (gate). There are words which are technical in nature and which did not have an equivalent in the vernacular such as maunits, mabank, mafasebuki or civil service / office related terms such as failo (file), komiti, lipoti (report), loya (lawyer), ofesi (office) and so on. Changes in the language are also driven by the need to use new technological terms in vernacular. For example, someone may say “Ndikupanga apudeti pa Fesibuku” (“I will update you on Facebook”) or “Ndinayesa kuchigugula” which means “I tried to google for it”.
Written text in Chichewa
The discussion about the language of literature in Africa is complex and worth looking into as a topic in its own right. In “Tongue and Mother Tongue: African Literature and the Perpetual Quest for Identity”, there are several articles discussing the use of local languages in contemporary literature. I did not find a comprehensive comparison between the genre of literature and how many exist in different African languages. However, from the various sources consulted, it seems that Chichewa (and Bemba) have a reasonably large number of works of literature compared to other Bantu languages. That is probably one of the reasons for its inclusion in Google Translate.
The language is in use by young and old generations. In Malawi, for example, the Writers Union is active and poetry in Chichewa attracts a wide audience. Two large media houses in Malawi publish written articles in Chichewa, e.g., The Nation and Radio Maria. Radio and television uses Chichewa frequently in their programs.
The grammar and phonetics of Chichewa contains a body of research. It is important to note that most of the first grammatical studies and written texts in Chichewa / Chinyanja came from the territories of Malawi (or former Nyasaland). This is not surprising because the first schools and language studies in the area were started early by the missionaries who settled in Nyasaland in the mid-late 1870s. They engaged in writing down the language, developing the first dictionaries and translating English texts into the language, in particular, the books of the Bible.
It is still the case that the most important (well read) and most widely available texts in Chichewa are religious texts. For some African languages these are the main content used for building datasets. This is not a disadvantage, as the Bible is a varied book containing history, poetry and even legal debates. However, this does have implications for the methodologies chosen for extracting language examples for datasets, these needing to take into account the vocabulary as used in various geographical areas, the genre of the text, and traditional vs contemporary language use.
Non-Religious literature, business and other genre of written text in Chichewa
The need for a wider range of text in Chichewa is evident. During the recent Covid-19 outbreak, sanitation, personal hygiene and health related information had to be quickly translated into the local languages. Articles in the local press suggested that Malawians would like their President to announce COVID-19 updates in Chichewa. Business and employee guidelines, insurance and banking contracts have been translated into Chichewa. Recently the need for and the importance of translating the court proceedings surrounding the overturned presidential elections of 2019 in Malawi, was debated in court and media. One of the case parties requested that the translations were stopped as the process of translation took up too much time and there were many challenges in translating legal terms! However, the court rejected this request.
The Need for Datasets in Chichewa
As discussed, seven important facts provide impetus to the initiative to develop data set for Chichewa: (1) Chichewa is an important African language, (2) it is representative of the Niger Congo Bantu group of languages, (3) it is widely spoken, (4) it contains a considerable literature, more than other local African languages, (5) there are several methodological grammar and phonetics studies and (6) several translations from languages such as English and (7) it is spoken by old and young alike.
There has been an interest in developing digital tools for language documentation and natural language processing. Such initiatives have come from researchers involved in linguistics, such as those belonging to linguistics departments at universities in Malawi and Zambia. For example, in Malawi, we found the Chichewa monolingual dictionary corpus containing about 13,000 nouns or this one phonetically annotated short corpus.
The comparative online Bantu dictionary at Berkley includes a dataset for Chichewa, however, the project seems to have stalled in 1997. More recently, there has been an interest in creating datasets used in NLP tools and machine translation and, recently, according to Professor Kishindo, there is a PhD candidate, Zangaphee Chris Chimombo, at the University of Malawi interested in working on Machine Translation for the language pairs Chichewa and Yao.
From our investigation, we observe that these datasets or tools tend to be kept in the private domain, are not regularly maintained, or are used only once, and are not well documented. However, their existence is important and it shows that there is a desire and need for such tools.
Conclusions
Chichewa is an important African language. There are differences between the main dialects of Chichewa and the language is undergoing continuous change. Improved methods for discovering online content and digitizing text can open new opportunities for organising Chichewa text into useful corpora. These can then be useful in linguistic work, in building tools for manipulating and comparing text, for finding and visualising connections between texts and for improving machine translation.
Chichewa continues to change as new terms are added to the vocabulary arising from technological needs for example. Its use by the younger generation creates new idioms and meaning, and the creative expressions through poetry and literature find venues online. Looking at language in new and novel ways using technology, can also help engage with the new generation in how they use, view and develop their language.
In this short article, we looked at the use of Chichewa and why we think it is important to build data sets for this language. We hope that this will be motivating and inspiring to others who are interested in this language or other African languages. This article was written as the author embarked on an AI4D Language Dataset Fellowship for putting together a Chichewa dataset. Here is another article on this initiative. This is a small but important initiative aimed at engaging with the Machine Learning generation on the African continent. I am honoured to be a small part in the building of such datasets.
Sources
1. Mbiri ya Achewa, Samuel Nthara, 1944.
2. Malawi Population census, 2018
3. Tribes, regions, and nationalism in democratic Malawi, Deborah Kaspin, Nomos, Vol. 39, Ethnicity and Group Rights (1997), pp. 464–503.
4. Joshua Project Website
5. A Grammar of Chinyanja Language as spoken at Lake Nyassa (1880), Alexander Riddel
6. Cyclopaedic Dictionary of the Mang’anja Language (1892), Dr. D.C.Scott.
7. Sociolinguistic survey, Center for Language Studies, Chancellor College, Mtenje, 1996 -1998.
8. Language Mapping Survey, Center for Language Studies, Chancellor College, Pascal Kishindo, 2009.
9. Archaeology and Oral Tradition in Malawi: Origins and Early History of the Chewa, Yusuf M. Juwayeyi, Copyright Date: 2020
10. Nyanja Linguistic problems, T. Price, Africa: Journal of the International African Institute, Vol. 13, №2 (Apr., 1940), pp. 125–137 (13 pages)
11. Language in Zambia, Ed. Sirarpi Ohannessian and Mubanga E. Kashoki, National African Institute, 1978.
12. ‘Buku Loyera: An introduction to the new Chichewa Bible translation’, Ernst Wedland, Kachere Monographs, CLAIM, Blantyre, 1998
13. “Tongue and Mother Tongue: African Literature and the Perpetual Quest for Identity”, 2002, Pamela J. Olubunmi Smith & Daniel P. Kunene (Eds.), African World Press